Table of Contents Foreword and Preface Introduction History of the Human Genome Project DO

---
Master Index Current Directory Index Go to SkepticTank Go to Human Rights activist Keith Henson Go to Scientology cult

Skeptic Tank!

Table of Contents Foreword and Preface Introduction History of the Human Genome Project DOE-NIH Coordination Scientific Five-Year Goals of the U.S. Human Genome Project Highlights of Research Progress Mapping Informatics Sequencing Activities Addressing Ethical, Legal, and Social Issues Related to Human Genome Project Data Technology Transfer and Industrial Collaboration Human Genome Center Research Narratives Lawrence Berkeley Laboratory Lawrence Livermore National Laboratory Los Alamos National Laboratory Program Management Infrastructure DOE OHER Mission Program Management Task Group Field Coordination Human Genome Coordinating Committee Human Genome Management Information System Human Genome Distinguished Postdoctoral Fellowships Resource Allocation Interagency Coordination Joint DOE-NIH Activities Joint Mapping Working Group Joint Informatics Task Force Joint Sequencing Working Group Joint Working Group on Ethical, Legal, and Social Issues Joint Working Group on the Mouse Other U.S. Genome Research U.S. Department of Agriculture National Science Foundation Howard Hughes Medical Institute International Coordination HUGO: Worldwide Genome Research Coordination UNESCO: Promoting the Interests of Developing Countries Appendices A. Primer on Molecular Genetics B. Conferences, Meetings, and Workshops Sponsored by DOE C. Members of the DOE Health and Environmental Research Advisory Committee D. Members of the DOE-NIH Joint Working Groups E. Glossary Index to Principal and Coinvestigators Listed in Abstracts Acronym List Foreword Acquiring complete knowledge of the organization, structure, and function of the human genome_the master blueprint of each of us_is the broad aim of the Human Genome Project. It is a new kind of program in biology, both in its size and focus on a limited set of goals and in its dependence on the development and use of technology. The coordinated U.S. Human Genome Project was officially initiated by the Department of Energy (DOE) Office of Health and Environmental Research and the National Institutes of Health (NIH) National Center for Human Genome Research (NCHGR) in FY 1991 with the publication in April 1990 of Understanding Our Genetic Inheritance; The U.S. Human Genome Project: The First Five Years 1991-1995. The DOE effort, which began very modestly almost 4 years before, is now over 5 years old. Taking stock of what has been done and what remains to be done is particularly appropriate at this time. That the ambitious scientific goal of the Human Genome Project can now be imagined is the result of the revolution occurring in biology during the last 20 years. Modern biological science has achieved a profound but still quite incomplete level of understanding of how the diversity of all living things is determined. This insight, along with scientific and technical advances in other fields, has brought unprecedented power both in being able to analyze and manipulate genetic structures and to use and store large quantities of genetic information. DOE is uniquely positioned to bring together expertise in physics, chemistry, engineering, and computer science to help solve fundamental biological problems and to exploit exciting opportunities presented by the Human Genome Project. Genome research will also contribute to the department's role in providing the scientific foundation for understanding the health effects of radiation and of chemical insults to the genome. The DOE program stresses mapping, the development of sequencing technologies and instrumentation, and informatics. Informatics refers to computational approaches in acquiring, storing, distributing, analyzing, and manipulating vast amounts of mapping and sequence data that will result from the project. Another important program component studies the ethical, legal, and social issues arising from use of the generated data, particularly in the privacy and confidentiality of genetic information. Cutting across all DOE biological and environmental research programs are several science education activities. The Human Genome Project is a closely cooperative activity between NIH and DOE. NCHGR is an important and essential participant. Internationally, the formation of the Human Genome Organization and the establishment of national genome projects by an increasing number of countries indicate the fascination and promise of this effort on the collective imaginations of many nations. In addition to the inherent excitement about increased knowledge of human life, the project offers the promise of many new opportunities for benefiting humanity through the development of new diagnostics, pharmaceuticals, and therapies for a multitude of human diseases; a wide range of improvements will flow from other biotechnology advances. Further expected benefits include improved risk assessment for individuals and populations exposed to agents that impact genetic material, as well as possible applications of the data to environmental and remediation issues. To be successful, the program must continue to focus on clear objectives for mapping and sequencing and to incorporate the flow of technological developments into the efforts of all working laboratories. Strategies must be planned carefully and in a comprehensive fashion as the next phase begins, in which mapping and sequencing results proliferate and technologies mature. Planning must be project-wide and include interagency planning at ever-earlier stages. This report describes the status of the DOE Human Genome Program and its accomplishments to date. Research highlights are noted from the program as a whole and from the three principal DOE human genome centers at Lawrence Berkeley Laboratory, Lawrence Livermore National Laboratory, and Los Alamos National Laboratory. These national laboratory facilities of DOE have been especially successful because they are organized to focus efforts, foster interdisciplinary projects, and use advanced technologies, some developed for other purposes, toward program goals. Essential work is also reported from 41 different research universities. Remarkable progress has been made in advanced instrumentation and informatics. A further indication of the increasing development of the DOE program is the simple statistic that the 1989-90 report had 157 pages and included 57 abstracts of work involving 211 scientists. The current program report contains over 240 pages and includes more than 150 abstracts of work involving over 400 investigators, essentially a doubling of DOE program size. The Human Genome Project ultimately will create scientific resources for the next wave of advances in biology and medicine. As the project is completed, accomplishments will dwarf those that have occurred in the biological sciences since the advent of recombinant DNA technologies. By the same token, the ethical and social consequences of the uses of this new knowledge must be considered as the knowledge is acquired; if this knowledge is responsibly obtained and applied, the next decade of biological research will be history's most fruitful and rewarding by any measure. David J. Galas, Associate Director Office of Health and Environmental Research Office of Energy Research U.S. Department of Energy Preface This is the third report summarizing the Department of Energy (DOE) Human Genome Program, its content, progress, and accomplishments. Since the program's conception in 1986 and initiation in 1987 by the DOE Office of Health and Environmental Research (OHER), its broad objectives have rapidly gained both national and international support. The program has made important strides in the development and application of technologies and tools that are required for the cost-effective characterization of the molecular nature of the human genome. This country's Human Genome Project is jointly administered by OHER and the National Center for Human Genome Research of the National Institutes of Health. A successful effort to characterize the molecular nature of human inheritance will require continuing international cooperation involving scientists from many countries. A number of other nations have begun substantial efforts to map and sequence the human genome and those of key model organisms. Although intellectual property issues threaten some aspects of international cooperation, increasing exchange of information has led to more involvement of the international community in discovery, acceleration of the pace of the research, and increased cost-effectiveness. International communication is facilitated by regular meetings to update the maps of individual chromosomes and by contributions to databases such as the Genome Data Base and nucleic acid sequence databanks. Through such databases a worldwide data aggregation and distribution system is being developed to exchange information regarding the genome. Aided by funding from the Human Genome Project, serious study is under way on ethical, legal, and social issues that are becoming more urgent because of the rapid growth in knowledge of human genetics. It is important to develop and disseminate deeper and more widespread understanding of these dynamic issues and of the choices available for families, the law, and society. An educated public is required to make intelligent choices in this area. The national genome project is now the largest provider of funds for study of such issues. A key to the long-term success of the program is the initial phase of intensive resource and technology development that requires input and involvement from many scientific and engineering disciplines. Exciting contributions have already been made to biomedical knowledge and biotechnology, and such advances are certain to continue at an ever-increasing rate. Announcements of discovery of important disease genes have become commonplace. Within 10 years nearly all the perhaps 100,000 genes that make up the human genome are likely to be found. Within 15 years the program is expected to culminate in a reference DNA sequence of the entire genome. Never has such a mass of data flowed into biology and medicine. An understanding of how genetic variations account for much of the richness and adventure of human diversity will be greatly increased. More practically, there can be little doubt of tremendous payoffs in terms of diagnoses and, ultimately, specific therapies for many human diseases. Moreover, new technologies and rapidly developing analytical tools to characterize the human genome will have widespread impact beyond human health. They will find application in revealing the genetic inheritance of many organisms of potential scientific and commercial interest and will provide an important stimulus to broaden and deepen the impact of modern biology in areas such as energy, environmental protection and waste treatment, agriculture, and the materials sciences. Of particular importance is the facile access to proteins that rapidly follows discovery of their genes. As a result of genome projects, we will soon be in a position to begin the systematic large-scale characterization of proteins and their structure. The interplay of molecular biology, structural studies, high-performance computing, and advanced molecular graphics will certainly lead to an understanding of macromolecular structure-function relationships. The scientific and economic implications of such a predictive understanding cannot be overestimated. It is the key to full realization of the potential of modern biology. Intense X-ray light and neutrons produced by unique, large, and expensive machines (synchrotrons and reactors) at DOE laboratories are important national resources for the determination of biological structure and, hence, for the national effort in biotechnology. A central goal of OHER is to provide access to these machines by making facilities and technical support available to structural-biology users, a need that has been projected to increase tenfold in the next several years. Finally, as Robert Sinsheimer elegantly pointed out in The FASEB Journal (November 1991), the Human Genome Project is an epic venture of discovery that will in time clarify many endlessly and fruitlessly debated mysteries of human nature. With this project we are launched upon a new stage of the age-old quest to illuminate the record of the human past_the prehistory of our species as recorded in the genetic script or blueprint for our being. When complete, the project will have provided us with an unprecedented resource_the complete text of our genetic endowment. It will be seen as a turning point in human history. David A. Smith, Director Health Effects and Life Sciences Research Division Office of Health and Environmental Research Office of Energy Research U.S. Department of Energy Acknowledgements The DOE Office of Health and Environmental Research gratefully acknowledges the contributions made by genome research grantees and contractors in submitting abstracts, photographs, captions, and narratives. The Human Genome Management Information System at Oak Ridge National Laboratory (managed by Martin Marietta Energy Systems, Inc., for the U.S. Department of Energy under contract DE-AC05-84OR21400) collected and organized the information, prepared the manuscript, and implemented the design and production of this publication. Introduction The U.S. Human Genome Project is the national coordinated 15-year effort to characterize all the human genetic material_the genome_by improving existing human genetic maps, constructing physical maps of entire chromosomes, and ultimately determining the complete sequence of the deoxyribonucleic acid (DNA) subunits in the human genome. Parallel studies are being carried out on selected model organisms to facilitate the interpretation of human gene function. The ultimate goal of the U.S. project is to discover all of the more than 100,000 human genes and render them accessible for further biological study. Current technology could probably be used to attain the objectives of the Human Genome Project, but the cost and time required would be unacceptable. For this reason, a major feature of the first 10 years of the project is to optimize existing methods and develop new technology to increase efficiency in DNA mapping and sequencing by 1 or 2 orders of magnitude. The genome will eventually be sequenced using continually evolving technologies and revolutionary methods not in existence today. Information obtained as part of the Human Genome Project will dramatically change almost all biological and medical research and dwarf the catalog of current genetic knowledge. In addition, both the methods and the data developed as part of the project are likely to benefit investigations of many other genomes, including a large number of commercially important plants and animals. For more information on the science of genomics, see Appendix A, "Primer on Molecular Genetics," p. 191. Terms are defined in the Glossary, p. 229. An acronym list is on the inside back cover. History of the DOE Human Genome Program A brief history of the U.S. Department of Energy (DOE) Human Genome Program will be useful in a discussion of the objectives of the DOE program as well as those of the collaborative U.S. Human Genome Project. The Office of Health and Environmental Research (OHER) of DOE and its predecessor agencies_the Atomic Energy Commission and the Energy Research and Development Administration_have long sponsored research into genetics, both in microbial systems and in mammals, including basic studies on genome structure, replication, damage, and repair and the consequences of genetic mutations. In 1984, OHER and the International Commission on Protection Against Environmental Mutagens and Carcinogens cosponsored a conference in Alta, Utah, which highlighted the growing roles of recombinant DNA technologies. Substantial portions of the meeting's proceedings were incorporated into the Congressional Office of Technology Assessment report, Technologies for Detecting Heritable Mutations in Humans, in which the value of a reference sequence of the human genome was recognized. Acquisition of such a reference sequence was, however, far beyond the capabilities of biomedical research resources and infrastructure existing at that time. Although the small genomes of several microbes had been mapped or partially sequenced, the detailed mapping and eventual sequencing of 24 distinct human chromosomes (22 autosomes and the sex chromosomes X and Y) that together comprise an estimated 3 billion subunits was a task some thousandsfold larger. DOE OHER was already engaged in several multidisciplinary projects contributing to the nation's biomedical capabilities, including the GenBankr DNA sequence repository, which was initiated and sustained by DOE computer and data-management expertise. Several major user facilities supporting microstructure research were developed and are maintained by DOE (see box, p. 55). Unique chromosome-processing resources and capabilities were in place at Los Alamos National Laboratory and Lawrence Livermore National Laboratory. Among these were the fluorescence-activated cell sorter (FACS) systems to purify human chromosomes within the National Laboratory Gene Library Project for the production of libraries of DNA clones. The availability of these monochromosomal libraries opened an important path_a practical means of subdividing the huge total genome into 24 much more manageable components. With these capabilities, OHER began in 1986 to consider the feasibility of a dedicated human genome program. Leading scientists were invited to the March 1986 international conference at Santa Fe, New Mexico, to assess the desirability and feasibility of implementing such a project. With virtual unanimity, participants agreed that ordering and eventually sequencing DNA clones representing the human genome were desirable and feasible goals. With the receipt of this enthusiastic response, OHER initiated several pilot projects. Program guidance was further sought from the DOE Health Effects Research Advisory Committee (HERAC, see Appendix C for a list of current members). The HERAC Recommendation. The April 1987 HERAC report recommended that DOE and the nation commit to a large, multidisciplinary, scientific, and technological undertaking to map and sequence the human genome. DOE was particularly well suited to focus on resource and technology development, the report noted; HERAC further recommended a leadership role for DOE because of its demonstrated expertise in managing complex and long-term multidisciplinary projects involving both the development of new technologies and the coordination of efforts in industries, universities, and its own laboratories. Evolution of the nation's Human Genome Project further benefited from a 1988 study by the National Research Council (NRC) entitled Mapping and Sequencing the Human Genome, which recommended that the United States support this research effort and presented an outline for a multiphase plan. DOE-NIH Coordination The National Institutes of Health (NIH) was a necessary participant in the large-scale effort to map and sequence the human genome because of its long history of support for biomedical research and its vast community of scientists. This was confirmed by the NRC report, which recommended a major role for NIH. In 1987, under the leadership of Director James Wyngaarden, NIH established the Office of Genome Research in the Director's Office. In 1989 this office became the National Center for Human Genome Research (NCHGR), directed by James D. Watson. After Watson's resignation in April 1992, Michael Gottesman was appointed NCHGR Acting Director. In addition to extramural support for research projects in physical mapping and the development of index linkage markers and technology, NIH also provides support for genetic mapping based on family studies and, following NRC recommendations, for studies on several relevant model organisms. DOE-supported genome research is focused almost exclusively on the human genome through support of large-scale physical mapping, resource and instrumentation technology development, and improvements in computational and database capabilities and research infrastructure. A significant portion of the DOE Human Genome Program is allocated to the DOE national laboratories. In several important areas, DOE and NIH cooperate to support critical resources such as the Genome Data Base (GDB) at Johns Hopkins University. Cofunded since 1991 as the central international repository of human chromosome mapping data, GDB is expected to receive supporting funds from other nations. DOE and NIH also cooperate to support joint workshops; a number of ethical, legal, and social issues projects; and the Human Genome News newsletter. Joint task groups under the DOE-NIH Joint Subcommittee on the Human Genome meet periodically to define program needs and develop recommendations for their parent DOE and NIH committees. OHER and NCHGR cosponsor workshops and meetings of the task groups on mapping; sequencing; informatics; the use of the mouse as a mammalian model; and_in a departure from most scientific programs_ethical, legal, and social issues related to data produced in the project. Many other highlights of the DOE OHER program follow in the succeeding sections of this report, including reports from the human genome centers; further details of program infrastructure, management, and coordination; resource allocation; and abstracts of individual research projects. Scientific Five-Year Goals of the U.S. Human Genome Project from the NIH-DOE Five Year Plan* [Implemented October 1, 1990 (FY 1991)] 1. Mapping and Sequencing the Human Genome Genetic Mapping Complete a fully connected human genetic map with markers spaced an average of 2 to 5 cM apart. Identify each marker by a sequence tagged site (STS). Physical Mapping Assemble STS maps of all human chromosomes with the goal of having markers spaced at approximately 100,000-bp intervals. Generate overlapping sets of cloned DNA or closely spaced unambiguously ordered markers with continuity over lengths of 2 Mb for large parts of the human genome. DNA Sequencing Improve current and develop new methods for DNA sequencing that will allow large-scale sequencing of DNA at a cost of $0.50 per base pair. Determine the sequence of an aggregate of 10 Mb of human DNA in large continuous stretches in the course of technology development and validation. 2. Model Organisms Prepare a mouse genome genetic map based on DNA markers. Start physical mapping on one or two chromosomes. Sequence an aggregate of about 20 Mb of DNA from a variety of model organisms, focusing on stretches that are 1 Mb long, in the course of developing and validating new and improved DNA sequencing technology. 3. Informatics_Data Collection and Analysis Develop effective software and database designs to support large-scale mapping and sequencing projects. Create database tools that provide easy access to up-to-date physical mapping, genetic mapping, chromosome mapping, and sequencing information and allow ready comparison of the data in these several data sets. Develop algorithms and analytical tools that can be used in the interpretation of genomic information. 4. Ethical, Legal, and Social Considerations Develop programs directed toward understanding the ethical, legal, and social implications of Human Genome Project data. Identify and define the major issues and develop initial policy options to address them. 5. Research Training Support research training of pre- and postdoctoral fellows starting in FY 1990. Increase the number of trainees supported until a steady state of about 600 per year is reached by the fifth year. Examine the need for other types of research training in the next year (FY 1991). 6. Technology Development Support automated instrumentation and innovative and high-risk technological developments as well as improvements in current technology to meet the needs of the genome project as a whole. 7. Technology Transfer Enhance the already close working relationships with industry. Encourage and facilitate the transfer of technologies and of medically important information to the medical community. *Understanding Our Genetic Inheritance; The U.S. Human Genome Project: The First Five Years FY 1991-1995, DOE/ER-0452P, U.S. Department of Health and Human Services and U.S. Department of Energy, April 1990. Highlights of Research Progress Mapping A major goal for DOE and NIH, as stated in the Five Year Plan (p. 5) for the Human Genome Project officially implemented in FY 1991, is to develop refined physical maps of chromosomes. Increasingly detailed maps will provide biomedical scientists with rapid access to important areas on chromosomes through their specific markers and ordered sets of DNA clones. Page numbers for research abstracts of investigators noted in parentheses can be located in the "Index to Principal and Coinvestigators Listed in Abstracts," p. 243. Physical Map Construction DOE sponsors both extensive physical mapping studies and supportive resource and technology development. Physical mapping of chromosomes 5, 11, 16, 17, 19, 21, 22, and X has been or is being supported directly. Increasingly detailed maps facilitate access to important chromosomal loci through their constituent markers and ordered DNA clones. The earliest concerted mapping efforts began on chromosome 16 at the Los Alamos National Laboratory (LANL) Center for Human Genome Studies and on chromosome 19 at the Lawrence Livermore National Laboratory (LLNL) Human Genome Center. These efforts have achieved excellent progress (see detailed narratives, pp. 46 and 36, respectively) through the development of effective multidisciplinary teams and efficient methods for generating clone "fingerprints." The fingerprints provide data for recognizing clone pairs that overlap, facilitating the construction of increasingly larger sets of overlapping clones, called contigs. Approximately 90% of chromosomes 16 and 19 is now represented by fingerprinted clones, and multiclone contigs span at least 80% of their length. Initial contig assembly methodologies are complemented by strategies designed to finish the physical maps and align them with genetic maps. This progress, together with the many contributions from other research groups (presented in the Abstracts section of this report), shows that resources and technologies required to achieve the mapping goals stated in the Five Year Plan are rapidly being realized. National Laboratory Gene Library Project (NLGLP) Among the resources most crucial to mapping progress are the libraries of clones representing each of the human chromosomes. Their availability reduces the total genome map ping effort to 24 smaller, more-manageable mapping projects. This chromosome-specific clone library production from physically purified chromosomes depends on the unique LANL and LLNL chromosome-sorting facilities maintained through the DOE NLGLP. These library resources are either distributed from the laboratories or through the American Type Culture Collection. As of December 1991 over 620 chromosome-specific libraries were distributed as resources for entire chromosome mapping efforts and for more-selective gene hunts. Current library production is focused on the needs of the major chromosome mapping projects (L. Deaven, LANL; P. de Jong, LLNL). Recombinant Clone Types Other biological resources are also being developed to further chromosome mapping progress. These resources include several useful genetic elements or recombinant DNAs and their cellular hosts. The largest elements are the intact, single human chromosomes maintained in somatic cell hybrids, such as single human chromosome/hamster-host cell hybrids. They are valuable for sorting out the human chromosomes for construction of single-chromosome libraries. Insert sizes of recombinants range from millions to a few hundred bases. Recombinant cosmid clones with 40- to 50-kb human DNA inserts predominated in the early contig-building efforts and continue to be a basic resource (refer to Abstracts: Resource Development, p. 82). Monochromosomal Yeast Artificial Chromosomes (YACs) YACs with inserts of 200 kb and larger, whose initial development was pioneered with NIH support, are now widely used in physical mapping projects. The recently developed capability to produce YACs from flow-sorted chromosomes is making available mono-chromosomal YAC libraries to speed mapping projects (M. McCormick, L. Deaven, and R. Moyzis, LANL). These libraries are made up of YACs containing human DNA inserts. This contrasts with libraries made from somatic cell hybrids, which are made up of YACs that contain mostly nonhuman DNA inserts. Clone Library Array and Analysis When user laboratories maintain clone libraries in the same arrayed-format addressing system, the information obtained from these libraries is maximized because the accumulated data from different laboratories can be readily combined. The tedious task of arraying thousands of DNA clones has been greatly alleviated through the development and implementation of automated or robotic processing systems (T. Beugelsdijk and P. Medvick, LANL; J. Jaklevic, Lawrence Berkeley Laboratory (LBL); and A. Olsen, LLNL). These systems are being increasingly utilized in clone analyses and in comparisons needed for overlap detection. Multiplexed Clone Overlap Detection Overlap detection of sequence homologies by DNA hybridization is speeded by multiplexing strategies in which the processing of pools of clones or their derivative probes replaces the more tedious analysis of individual clones. Multiplexing was first implemented by the chromosome 11 mapping group (G. Evans, Salk Institute for Biological Studies). Several second-generation multiplexing schemes are now being implemented to speed overlap detection both within libraries and between members of different types of libraries (J. F. Cheng, LBL; P. de Jong, LLNL). Messenger RNA/cDNAs Used To Generate Sequence Tagged Sites (STSs) STS marking of DNA clones provides a common language for uniting the results obtained with different types of recombinant DNAs and varied approaches to map generation. An STS is a short, unique DNA sequence (generally 100 to 300 bp) that distinguishes a chromosomal locus. The STS segment can be selectively amplified within the entire genome by the polymerase chain reaction to provide an identifying tag for any DNA clone containing the site. DOE is emphasizing the use of STSs for expressed genes, as represented by their derivative cDNAs. Mapping these STSs onto contigs and to their chromosomal loci is thus rapidly placing genes on the developing chromosome maps (refer to Abstracts: Resource Development, p. 82). Microdissection Libraries Chromosome microdissection can facilitate region-specific mapping efforts, such as the localized ordering of clones on the much longer chromosomes, by identifying sets of clones derived from the specific region. Region-specific probes can also serve in the identification of locally expressed genes by selectively displaying their counterparts within complex cDNA libraries (F.-T. Kao, Eleanor Roosevelt Institute). Libraries of Hybrid Somatic Cells with Partial Human Chromosomes Aberrant chromosomes arising from rearrangement processes can be moved into host rodent cells, providing for the maintenance of a human subchromosomal segment. A large hybrid set has been assembled for chromosome 16 (G. Sutherland, Adelaide Children's Hospital, South Australia). These partial chromosomes together define over 100 chromosomal segment "bins" to which clones, contigs, and other DNA markers can be assigned by DNA hybridization tests. This resource system is greatly speeding the completion of the chromosome 16 map. Fluorescence In Situ Hybridization (FISH) The previous mapping of DNA clones by FISH onto metaphase chromosomes has now been extended to the much less condensed interphase and pronuclear DNAs. Mapping onto less-condensed chromosomes increases spatial resolution and the capacity to order closely spaced markers. As a component of evolving mapping strategies, FISH is serving to locate and orient cosmid contigs on intact chromosomes and measure distances between the cosmids as well as to mapped cDNAs. (J. Gray, University of California; J. Korenberg, Cedars-Sinai Medical Center; B. Trask, LLNL). Fragile X Locus Cloned The fragile X locus has been cloned and its mode of action is being characterized (C. T. Caskey and D. L. Nelson, Baylor College of Medicine; and collaborators). Fragile X syndrome may be the most common form of inherited mental retardation. About 1 in 1500 males and 1 in 2500 females are affected by the syndrome, which is caused by a high mutation frequency at the fragile X locus. Myotonic Dystrophy Locus Cloned The gene responsible for myotonic dystrophy, an autosomal dominant disease, has been identified and cloned. The structural defect is characterized by a tandemly repeated segment of DNA within or close to the coding region on 19q13.3. The extent of the amplified region appears to be associated with the severity of the disease (C. T. Caskey, Baylor College of Medicine; P. de Jong and A. Carrano, LLNL; and collaborators). Informatics Multiple informatics capabilities will be crucial to the successful application of data derived from the genome project. Informatics expertise, software, and hardware are being developed in the following areas: chromosome map assembly, databases, DNA sequence analysis, and laboratory automation. Map Assembly Algorithms for automatically assembling physical maps from cloned fingerprint data have been further improved (E. Branscomb, LLNL; M. Cinkosky, V. Faber, J. Fickett, and D. Torney, LANL). Software permitting fast parallel computations on multiple computers was developed to speed computation-intensive mapping analyses (E. Branscomb, LLNL). A computer communication and interrogation system is being assembled to minimize redundancy during the production of STS chromosomal markers from cDNAs. Participating laboratories will rapidly query distant databases to determine the novelty of a candidate mRNA/cDNA before further pursuing the STS-generation process. Databases Graphical interfaces for mapping databases were constructed to display several different types of aligned chromosomal data and provide expandable views [R. Douthart, Pacific Northwest Laboratory (PNL); J. Fickett, LANL; S. Lewis, Lawrence Berkeley Laboratory (LBL); R. Overbeek, Argonne National Laboratory (ANL)]. The electronic Laboratory Notebook database and similar databases are being continuously expanded to include new data types as mapping strategies evolve (J. Fickett, LANL). The internationally available Genome Data Base (GDB), housed at Johns Hopkins University and cofunded since September 1991 by DOE and NIH, is the primary reference data-base for human chromosome mapping data produced in the United States and abroad. The organizational structure of GDB is shown on the opposite page (P. Pearson, GDB). In a collaboration between LLNL and GDB, computer system interfaces have been devised for automatically transferring large amounts of data from mapping centers to GDB for integration into and updating of chromosome maps. Enhancements of the GenBankr DNA sequence database located at LANL continue. Primarily supported by NIH with contributions from DOE, GenBank exchanges data daily with European and Japanese databases. GenBank has expanded its electronic data-publishing facilities and has reached agreements with a number of journals to facilitate electronic publication of large volumes of DNA sequence data (J. Cassatt, NIH). Sequence Analysis gm, developed at New Mexico State University, is the first DNA sequence analysis algorithm capable of recognizing and ordering the set of protein-coding regions (exons) from among the noncoding regions (introns) comprising a gene, rather than predicting isolated protein-coding sequences. gm has been distributed to laboratories worldwide (C. Fields, now at NIH, and C. Soderlund, now at LANL). Gene Recognition and Analysis Internet Link (GRAIL), a novel neural network-based algorithm for identifying exons within DNA sequences, is online at Oak Ridge National Laboratory (ORNL) to serve the biological community by automatically analyzing sequences. From a number of examples, this artificial intelligence system learns several distinct sequence characteristics through which exons can be recognized. GRAIL automatically accepts input sequences sent to ORNL over Internet and returns the output analysis to the sender (R. Mural and E. Uberbacher, ORNL). Laboratory Automation Advances continue in the linking of laboratory instruments directly to data-acquisition computers and analysis software at the LANL, LLNL, and LBL human genome centers. Sequencing The DOE Human Genome Program has supported both evolutionary (incremental, gel-based) improvements to classical sequencing methods and several revolutionary (completely novel, gel-less) technologies. Steady advances have occurred in the evolutionary area with the implementation of automated sample preparation, multiplex sequencing, and strategies that minimize the need for prior subcloning. Gel Sequencing Approaches Multiplex sequencing systems have matured enough for transfer to the commercial sector (G. Church, Harvard Medical School; R. Gesteland, University of Utah). The readout of multiplexed gels and blots using stable isotopes as nucleic acid labels has the potential to increase sequencing speeds by at least a factor of 10 because resonance ionization mass spectroscopy is capable of differentiating many isotopes (H. Arlinghaus, Atom Sciences, Inc.; K. B. Jacobson, ORNL). Chemiluminescent label systems are now substituting for the less-desirable radioactive labels in many applications (I. Bronstein, Tropix, Inc.). Systems have been developed to retain chromosome continuity information by bypassing the customary subcloning step in the sequencing of recombinant DNAs (D. Berg, Washington University; C. Berg and L. Strausbaugh, University of Connecticut; J. Dunn and F. Studier, Brookhaven National Laboratory; R. Gesteland and R. Weiss, University of Utah). Fractionation speeds on capillary and very thin slab gels are 10-fold faster than on traditional thick gels (N. Dovichi, University of Alberta, Canada; B. Karger, Northeastern University; L. Smith, University of Wisconsin). The fluorescence/luminescence detection of fractionated nucleic acids has been significantly improved to allow detection of the smaller amounts of DNA loaded on capillary and thin slab gels (N. Dovichi, University of Alberta; R. Mathies, University of California; E. Yeung, Ames Laboratory). Over 300 kb have been sequenced from human and mouse T-cell receptors, providing fundamental new insights into the molecular biology of the immune response (L. Hood and T. Hunkapiller, California Institute of Technology). Gel-less Sequencing Technologies The technology for interrogating or sequencing clones by hybridization with short oligomers has passed a second proof-of-concept test. Three unknown DNA fragments were fully and accurately sequenced (R. Crkvenjakov and R. Drmanac, ANL). In research and development for single-molecule sequencing by processive nucleotide release, the capacity to detect single nucleotides by laser-induced fluorescence has been demonstrated (R. Keller and J. Jett, LANL). Progress is being made in developing methods to sequence DNA using lasers coupled to a mass spectrometer. The great advantage of these approaches is that the mass spectrum can be acquired in milliseconds (C. Chen, ORNL; J. Jaklevic, W. Benner, and J. Katz, LBL; L. Smith and B. Chait, University of Wisconsin; R. Smith, PNL; P. Williams and N. Woodbury, Arizona State University). Activities Addressing Ethical, Legal, and Social Issues Related to Human Genome Project Data In FY 1991, DOE activities on ethical, legal, and social issues (ELSI) included two conferences, three education projects, and three research projects. The first conference, Justice and the Human Genome, held in November 1991 at the University of Illinois College of Medicine, considered discrimination that could result from the use of genetic information about ethnic and other groups. The second conference, held in March 1992 at the Texas Medical Center Institute of Religion, focused on Genetics, Religion, and Ethics. The three education projects on the science and the societal implications of data produced in the Human Genome Project, listed with their preparers, include (1) a module to be developed and distributed to all U.S. high school biology teachers (Biological Sciences Curriculum Study); (2) an educational television series, "Medicine at the Crossroads," which will address the role of genetics in understanding and treating disease (WNET, New York, cofunded with NIH and the National Science Foundation); and (3) a program of hands-on workshops for public officials and other nonscientists (Cold Spring Harbor Laboratory). The three ongoing research projects, listed with the institutions developing them, are (1) a study of ethical issues arising from the rapid proliferation of genetic tests that can predict future disease in otherwise healthy individuals [National Academy of Sciences (NAS) Institute of Medicine, cofunded with NIH]; (2) a legal study of confidentiality protection for genetic data (Shriver Center); and (3) a study to consider problems in funding young investigators in biological and biomedical sciences (NAS). In its first 2 years, the DOE Human Genome Program funded a variety of ELSI activities, noted above. To avoid being spread too thinly, the ELSI component of the DOE Program now focuses on confidentiality and privacy concerns raised by increased genetic data about individuals. This sensitive, personal information, which may predict disorders before symptoms occur or treatments are available, can affect a person's self-image, employability, status in the eyes of others, and ability to obtain health insurance. Since genetic knowledge can also lead to better understanding of disease causation and to more-accurate assessments of environmental affronts, a balance must be achieved between the health of the public and the privacy interests of the individual. The DOE Human Genome Program is funding six new projects covering ELSI activities in research and education. One of the three projects investigating genetic discrimination will compare two states (Florida and Georgia), contrasting their genetic testing, screening, and counseling programs and the impact on different ethnic and socioeconomic communities. Another will examine the impact of two genetic conditions (cystic fibrosis and sickle cell disease) on African-Americans and Caucasians. A third will identify particular social institutions that may engage in discrimination and will consider whether the discrimination, if present, is the result of ignorance or systematic policy. A fourth project will explore in detail (a) the effect of genetic knowledge on the right of privacy and (b) the uses of genetic information in public health planning. A fifth project will develop a program of educational workshops for secondary and high school science teachers, focused on both the science and the ethical, legal, and social issues arising from data generated by human genome research. A six the project will involve a second educational television series, "The Secret of Life" (WGBH, Boston), which will address the current revolution in molecular biology and genetics. Other activities include conferences on Genes and Human Behavior: A New Era? (October 1991); Computers, Freedom, and Privacy (March 1992); and Science, Technology, and Ethical Responsibility (scheduled for June 1992). While very challenging issues are raised by genome research, solutions are not simple; defensible rights often exist on both sides of any issue. Further research is needed, as well as activities to promote public awareness and assist in policy development. Also, with the increasing use of computers to assemble, store, and organize data (including genetic data) into large databases, the issues of security and access control become more acute. To begin reorienting and better defining the scope of ELSI activities in the DOE program, the DOE-NIH Joint ELSI Working Group has established a collaborative effort on privacy to identify an ELSI research agenda and develop a more detailed approach to some of these concerns. Technology Transfer and Industrial Collaboration Technology transfer, considered one of the three most important facets of the DOE mission (along with meeting the nation's defense and energy needs), is enhancing U.S. investment in research and technological competitiveness. By creating new products, markets, and jobs, the rapid deployment of technology from the research laboratory to the marketplace can play an important role in vitalizing the U.S. economy. A vast potential exists for commercial development of genome resources and technology; applications to clinical medicine have already begun. All participants in the Human Genome Program are encouraged to engage in active collaborations with the private sector and transfer their resources and technologies for commercial development. Each national laboratory has a technology transfer office. The LLNL, LBL, and LANL human genome centers provide a variety of opportunities for collaborations on joint projects or for obtaining direct access to technology. They are also exploring additional ways to increase cooperation with the private sector; a number of interactive projects are now under way, and additional interactions are in the preliminary stages. In some instances, private industries are marketing technologies developed at DOE-sponsored research laboratories and are providing research funds or other resources to the centers; other collaborative programs involve joint development of technologies and their applications to achieve project goals. One mechanism being used by the DOE national laboratories is the Cooperative Research and Development Agreement (CRADA). The first CRADA in the genome project, established by DOE in the spring of 1991, was between Life Technologies, Inc. (LTI) and the LANL Center for Human Genome Studies for technologies developed in the single-molecule sequencing project. In this project an LTI-modified DNA polymerase will be used to label a single DNA strand with four different fluorescent, base-specific tags. After an exonuclease cuts the labeled nucleic acid base pairs from the DNA, the labeled bases will be induced to fluoresce as they pass sequentially through a focused laser beam. The bases can be identified and counted by a sensitive photodetector (see figure on p. 25 for more information). If successful, the technology will allow sequencing of 50,000-bp DNA fragments at 1000 bp/s. LTI will have the first opportunity to license products resulting from the joint effort and would pay royal ties to LANL under such a license. Potential commercial advancements in the Human Genome Program have also been recognized outside the genome community. Research and Development magazine selected an achievement by Edward Yeung and other Ames Laboratory scientists as one of the 100 most significant developments of 1991. This R&D 100 Award was given for the development of a user-friendly instrument that detects with extremely high sensitivity the fluorescent molecule concentration (based on laser-excited fluorescence), an improvement that may lead to routine high-speed DNA sequencing by capillary gel electrophoresis. A U.S. patent for portions of this technology has been issued, and several commercial manufacturers are considering the possibilities of marketing the instrument. A technology pioneered by LLNL to identify chromosomal abnormalities (e.g., aneuploidy, translocations, and deletions) has been licensed to Imagenetics, Inc., a medical diagnostics firm that will manufacture the technology and provide funding for future research and development. This technology involves the use of specially developed fluorescent dyes called Whole Chromosome Paints™ to detect diseases such as cancers and leukemia. Whole Chromosome Paints are being marketed by LTI. Some other technology transfers from DOE-sponsored genome research, both at the national laboratories and extramurally, are highlighted below. In progress or awaiting finalization are many more developments and agreements, some of which cannot be disclosed at this time because of their proprietary nature. Resources. Collaborative agreements have aided in the further development of several new technologies used in genome research, as well as in their commercial applications. New methods are being evaluated for use in isolating mRNA, chromosomes, and restriction fragments; in amplifying hybridization signals; and in extending DNA molecules. In addition, bacterial host strains have been developed that give greater stability to cosmid constructs containing human DNAs. Improvements are being made in DNA detection methods by the development of new probes, stains, and fluorescent dyes. As a result of the recent cloning of the fragile X gene, several companies are negotiating for licenses to develop assays for diagnosing fragile X syndrome, probably the most frequently inherited form of mental retardation. Hardware. Automation and enhancement of data collection and analysis has been the goal of many collaborations with the commercial sector. Equipment is being designed to automate (1) the production of high-density arrays on agarose or filters and (2) clone fingerprinting by gel electrophoresis (as well as the data collection and analysis software). Advanced applications for robotic systems are also being developed. The resolution of DNA fragments is being enhanced by improvements in pulsed-field gel electrophoresis. Resonance ionization spectroscopy is being modified to enable rapid detection of stable isotope labels on DNA following gel electrophoresis. A commercial gel scanner is being developed for reading DNA gels. Software. To aid physical map construction, programs are being designed for efficient clone analysis. Several other image-analysis programs are being developed, including data-capture software for images from video screens in combination with a DNA molecule imaging system. Sequencing. Multiplex sequencing technologies are being used to sequence pathogenic microbes. Human Genome Center Research Narratives Lawrence Berkeley Laboratory Since its inception in 1987, the Lawrence Berkeley Laboratory (LBL) Human Genome Center has focused on developing the necessary research and analytical technology to speed genome mapping and decrease the cost of sequencing. Over the last year, LBL has strengthened its ties with the University of California, Berkeley, particularly in the biological sciences. This collaboration fosters interdisciplinary activities in biology, instrumentation, and informatics. Biology The biology component at LBL is concentrating on developing and improving mapping and sequencing strategies for human chromosome 21. To achieve these goals, investigators in each biology project draw on the expertise of the center's instrumentation and computing groups. Two major biology projects are under way, and a third is in development. Physical mapping at LBL is focused on a 10-Mb region of human chromosome 21, and over 90 unique chromosome 21-specific yeast artificial chromosomes (YACs) have been located by fluorescence in situ hybridization (FISH). A new method has been developed that permits rapid isolation of chromosome-specific YACs, using probes isolated from flow-sorted chromosome libraries from Lawrence Livermore National Laboratory. In addition, cDNAs specific to a given YAC are being isolated by an automatable procedure based on magnetic beads. The second major biology effort involves testing new approaches to physical mapping and genomic sequencing. These projects exploit current methods, such as FISH and appropriate pooling strategies, for efficient isolation of overlapping clones. In addition, new work has begun on subcloning and ordering libraries of clones for mapping and on the use of gamma delta transposons as the primer site for sequencing studies. Increased efficiency in constructing physical maps results from a clone-limited strategy for generating maps based on sequence tagged sites (STSs). This nonrandom selection strategy reduces the number of STS assays required and produces contigs that cover a larger fraction of the genome. The third biology project is aimed at developing automated methods for generating genetic maps. A simple filter assay will be used to detect heterozygosity at mapped loci in yeast, mice, and human DNA samples. Instrumentation The instrumentation program within the LBL Human Genome Center has two major areas of effort: (1) biology and instrumentation development and support and (2) new instrumentation development based on emerging technologies. Supporting activities include the design and fabrication of gel boxes, automation of protocols on existing robotic frameworks, and the installation and networking of a variety of image-acquisition systems. In addition, advanced robotic [high-speed colony picking, robotic-based polymerase chain reaction, and DNA synthesis] and laboratory systems integration is under development. Efforts to produce new, adaptable technologies for the genome program include optimizing large-molecule detection systems; designing versatile optical fluorescence systems for multiplex labeling; and developing microfabricated arrays for application to large-scale clone libraries, sequencing by hybridization, and other procedures. The use of computer-controlled robotic systems provides a mechanism for automatically capturing the vast amount of data generated by laboratory operations. This requires a close coordination between hardware and software development in laboratory system design that goes far beyond automation of a few discrete protocols. Informatics A major part of the computing and instrumentation effort is driven by biology projects. The center's computing group focuses on specific applications in four major areas: raw data acquisition and analysis, information tracking and management, data interpretation and comparison analysis, and development of software tools. Visual data for mapping (including in situ pictures, autoradiograms, ethidium gels, and chemiluminescent staining) are handled by BioPix, a set of programs that assemble and integrate data from image capture to analysis. A similar system is being developed for sequence data. The Chromosome Information System (CIS) allows biologists to search, edit, and compare various maps, markers, and related reference information and to interact with other programs to exchange data. The laboratory data analysis system uses existing software packages and provides system management and support throughout the center. New, in-house analysis packages are being devised for sequence alignment and assembly. Software development tools permit rapid design and modification of database management systems, thus facilitating increased productivity, vendor independence, and conceptual clarity. Achievements * Over 90 independent YACs averaging 100 kb were regionally assigned to human chromosome 21 by FISH. These YACs include genetic markers to help integrate maps. * Two hundred unique probes were isolated for chromosome 21 and are being used to identify YACs from genomic libraries. * A rapid cDNA clone-screening method uses immobilized YAC clones to screen cDNA libraries, which are then localized on specific chromosomes. An alternative screening method uses individual YACs or cosmids attached to magnetic beads to isolate specific cDNAs, a method that can be readily automated to speed identification of coding sequences for physical mapping. * Marker-selected libraries, highly enriched for clones containing (CA)n repeats, were constructed from primary genomic libraries. These enriched libraries increase the efficiency of screening almost 50-fold. * A probe-mapping procedure determines the distance between the probe and the chromosome or YAC end. This method, which uses X rays to break large DNA pieces randomly, can be used to map cDNAs and to estimate the length of entire genes. * A double-ended, clone-limited strategy for physical mapping of chromosomes was devised. This strategy maps chromosomes on the order of 100 Mb and should result in larger contigs with a minimum of assays. * CIS, developed by the genome center computing group, was used to produce consensus maps at workshops on human chromosomes 3 and 21 and is being expanded for use with a number of plant species in the Plant Genome Program of the U.S. Department of Agriculture. * High-level database design tools have been developed to permit molecular biologists to define data objects in a way that captures biological concepts. The software automatically generates low-level commands for a commercial database management system, facilitating the evolutionary development of modular system components. These tools are also being used by researchers to design the Superconducting Super Collider database and the Integrated Genome Database. * A variety of mechanical, electrical, and chemical means have been used to manipulate DNA molecules; these methods include stretching molecules physically by externally applied electrical fields and guiding the molecules through grooves in a glass surface; digesting and separating single molecules; and picking up, transporting, and releasing DNA with scanning tunneling microscope (STM) tips. * Investigation of the feasibility of using STM for visualizing the individual bases of single-stranded DNA has shown that while purines and pyrimidines can be distinguished from each other, two bases in the same class cannot be differentiated by this method. * A fast, filter-based assay was developed to identify single base-pair polymorphisms, eliminating the need for gel assays. * Higher throughput was achieved through the construction of a dedicated high-speed colony-picking workstation. The pick rate is 10 to 20 times faster than the initial picking system and both faster and more accurate than a highly qualified human. The new picker arrayed an entire library of over 10,000 clones in 1 day. * Robots have been modified for use with a number of chemistry protocols, including cosmid and YAC library replication, various pooling schemes, and high-density filter array production. Using the robot to replicate libraries has made copies available to researchers in the private sector and in other national laboratories. Future Plans * Construction of a 10-Mb contig of human chromosome 21 based on overlapping YACs. The sequence will be determined by the most efficient strategy available. * Sequencing of a P1 clone. Subclone assembly will use a nonrandom strategy, and primer sequences will originate in the transposon gamma delta. * Construction of chromosome genetic maps of human chromosomes 16 and 19 in collaboration with other DOE genome centers. A simple gel-based heterozygosity assay is being developed to support this research. * Development of a computational biology program within the computing group to design and implement new algorithms for sequence assembly. Preliminary data will come from collaborations with other genome centers. * Design and implementation of a software tool suite for managing information and for optimizing the unique strategy of particular research groups. As large-scale sequencing projects develop, new acquisition and analysis software will be integrated into CIS. * Implementation of QUEST, a database tool that will provide a single entry point to the conceptual data model. QUEST will then implement automatically any changes in the user interface, the database query procedures, and the database schema definition. * Optimization of improved detectors and the associated mass spectrometry system for large biological molecules. * Automation of handling and analysis of dot-blot hybridization experiments and the implementation of a high-speed colony-picking apparatus. For more information on the LBL Human Genome Center, contact Jasper Rine, Director, or Sylvia Spengler, Deputy Director, at 510/486-4943. Lawrence Livermore National Laboratory The Human Genome Center at Lawrence Livermore National Laboratory (LLNL) is a multidisciplinary team effort that brings together chemists, biologists, molecular biologists, physicists, mathematicians, computer scientists, and engineers in an interactive research environment. Many of these individuals have previously collaborated on research projects in molecular biology, cytogenetics, mutagenesis, and instrumentation, as well as in the National Laboratory Gene Library Project (NLGLP). These projects have contributed substantially to the identification and characterization of human DNA repair genes, specifically the three on chromosome 19 that are a focus of interest at LLNL. The short- and long-term goals of the LLNL effort are to (1) develop biological and physical resources useful for genome research, (2) model and evaluate DNA mapping and sequencing strategies, (3) couple these resources and strategies in an optimal way to construct ordered clone maps and DNA sequences of human chromosomes, and (4) use the map and sequence information to study genome organization and variation. To achieve these goals, the Human Genome Center is organized into three broad research and support areas, each consisting of multiple projects led by a principal investigator. Extensive interaction occurs within and among all projects that have as their common goal the construction of ordered clone maps of the human genome. The program structure of the center includes a core facility and projects that focus on physical mapping and enabling technologies. Research and Support Areas Coordination and collaboration take place with other research groups throughout the world that are involved in the genome initiative or other mutual scientific interests. The role of LLNL in the Human Genome Project is seen as encompassing several areas, including technology development, map construction, map interpretation, and integration with ongoing and new programs in structural biology and mutagenesis. The following three components are highly interactive; individual staff members often have responsibilities in more than one component. Core facilities. The administrative group is concerned with budget oversight, external and internal meeting coordination, preparation of center reports, training coordination, property and space management, safety oversight, and secretarial support. The scientific core provides general support to the physical mapping effort, including cell culture and DNA extraction; library, probe, and clone management; oligonucleotide synthesis; fluorescence-based restriction mapping; and DNA sequencing. The core also facilitates material distribution to collaborators in the external community. Mapping activities. Five projects represent the coordinated effort to obtain an overlapping set of clones for human chromosome 19 and to further characterize genomic organization: * Assembly, closure, and characterization of a chromosome 19 contig map. The goal of this project is to construct an overlapping set of cosmid clones using a variety of techniques. An automated fluorescence-based restriction-fragment fingerprinting strategy is used to establish a foundation map of cosmid contigs. The contig closure effort will focus on using yeast artificial chromosomes (YACs) and cosmids with two hybridization-based techniques; one is based on fragments generated from Alu sequence primers or sequence tagged sites (STSs) by the polymerase chain reaction (PCR) and the second on RNA transcripts generated from the ends of cloned inserts. * Interdigitation of the physical and genetic maps of human chromosome 19. The goals of this effort are to locate known genetic markers on the expanding contig map, to coordinate the isolation of chromosome 19-specific STSs, and to localize them on the cosmid map. * DNA sequence mapping by fluorescence in situ hybridization (FISH). This project exploits the power of FISH on metaphase chromosomes, interphase cells, and pronuclear DNA. FISH will be used to determine the location of genes of interest and the relative order and orientation of the cosmid contigs. * cDNA mapping. The goal of this project is to isolate, sequence, and map cDNAs-expressed in a variety of human tissues_that will become the STSs on which future studies of genetic organization and gene function will be based. * New mapping strategies. New methods useful for library construction, contig closure, and overlap detection will be developed and validated. Focus is on improving Alu-PCR-based technology and pooling schemes to achieve closure of the chromosome 19 map with cosmids and YACs. Enabling technologies. The following groups provide computational, resource, and instrumentation support for research activities: ùComputational support for the Human Genome Center. This group is responsible for mathematical modeling of mapping and sequencing strategies and the development and application of data analysis algorithms and software. They are also responsible for the construction and maintenance of interactive relational databases that enable internal and external data access, including development of graphical visualization tools. * NLGLP. This project, a joint effort with Los Alamos National Laboratory, draws upon LLNL experience in flow instrumentation and chromosome sorting to construct human chromosome-specific libraries in lambda and cosmid vectors for use in physical mapping and other studies. * Instrumentation for cytogenetics and gene mapping. This group is responsible for developing instrumentation to facilitate flow systems analysis and chromosome sorting and to support FISH. Accomplishments The LLNL Human Genome Center has made excellent progress in the construction of an ordered set of cosmids for chromosome 19, the development and application of new biochemical and mathematical approaches for constructing ordered clone maps, the automation of fingerprinting chemistries, and high-resolution imaging of DNA. Major accomplishments are highlighted below. * Considerable progress has been made toward the closure of the chromosome 19 physical map. More than 10,000 cosmids have been analyzed by an automated fluorescence-based fingerprinting approach and assembled into over 870 contigs that span about 80% of the chromosome. FISH has been used to locate over 400 cosmids and 117 contigs on the cytological map, and more than 70 known genetic markers have been located on cosmid contigs. Closure of the gaps between contigs is under way using YACs and cosmids. * Cosmid contigs analyzed in the carcinoembryonic antigen (CEA) gene family region of chromosome 19 were found to be tightly linked over relatively short stretches of DNA. This gene family of about 22 members appears to span a contiguous region of about 1 Mb. With probes made from the ends of these contigs, hybridization techniques were applied to join contigs established by fingerprinting into larger contigs. In addition, almost 2 Mb surrounding the myotonic dystrophy locus were linked with cosmids and YACs. * More than 20 clones containing DNA sequences corresponding to a number of important genes and regions that map to chromosome 19 were isolated from two separate YAC libraries. Among these clones were the region encoding the LDL receptor and ApoE gene, two important components of the regulation of cholesterol and triglyceride metabolism in humans. Similarly, a region was isolated that encodes a family of serine proteases called Kallikreins, whose role is the specific proteolytic activation of peptide hormones and growth factors. Clones of these regions are being used for the structural analysis and mapping of these genes. * A structural defect found in the cloned gene linked to the autosomal dominant disease myotonic dystrophy has been identified through an international collaboration. This chromosome 19 defect, which is characterized by a tandemly repeated segment of DNA within or close to the coding region on q13.3, is similar to that seen in the fragile X syndrome. The extent of the amplified region appears to be associated with the severity of the disease. * The gene for DNA ligase 1 was mapped to the long arm of chromosome 19. A defect in this gene may be associated with increased cancer risk. This is the fourth gene involved in DNA metabolism that has been mapped to this region of chromosome 19. * Significant progress was accomplished in defining the organization of the cytochrome P450 genes mapping to chromosome 19. Multiple members of each of the three subfamilies were identified. The cosmids containing these genes will be useful resources for studies of the function and physiological importance of the genes. * Three levels of resolution of FISH have been developed and applied to localize and orient cosmids. Localizing cosmids to metaphase chromosomes provides a resolution of about 1 to 3 Mb. Localization to somatic interphase cells gives a resolution of from 50 kb to 1 Mb and hybridization to sperm pronuclei a 20-kb to 1-Mb resolution. With FISH, a linear relationship was demonstrated between physical distance and genomic distance of 20 kb up to at least 800 kb in pronuclei derived from human spermatozoa. With a single probe, the presence of multiple copies of the closely related genes of the CEA family has been detected in human sperm pronuclei. Single and multicolor hybridizations are routinely performed. * A reproducible method of mapping YACs by FISH has been developed. This procedure involves isolating YACs with pulsed-field gels, digesting with the restriction enzyme Mbo I, ligating to oligonucleotide linker adapters, and amplifying with PCR. The products are then mapped onto human metaphase chromosomes by standard FISH methods. * The technique of Alu-PCR has been further exploited. To isolate region-specific DNA probes from human-rodent hybrid cell lines, previously developed PCR procedures were expanded. Human sequences are preferentially amplified using PCR primers specific for repeats of the human Alu repeat family. Several new primers have been developed that amplify human DNA sequences very efficiently, further facilitating probe isolation from human genome regions present in the available hybrids. Many different human sequences amplify from the hybrids; individual probe sequences are obtained by subsequent cloning in plasmid vectors in Escherichia coli. To expedite this, ligation-independent cloning has been developed to increase efficiency of cloning and eliminate the common background of clones that do not contain recombinant DNA molecules. In addition, an efficient procedure has been developed to clone the PCR products common to two cell lines. This method _coincidence cloning_permits a further enrichment for sequences derived from defined regions of the genome. * Clone-pooling schemes have been developed to facilitate screening of both cosmid and YAC libraries. Each clone is present in a number of different pools, reducing the number of DNA samples that must be deposited on a high-density filter for hybridization-based screening and the number of tubes needed for PCR-based screening. Since each clone is defined by a unique combination of pools, the screening of pools by probe hybridization permits identification of the recombinants shared by a number of pools. This approach was used very successfully to screen a 10,000-clone cosmid library. The idea also was used to consolidate a 60,000-clone YAC library into about 1800 sample pools. Results demonstrated that hybridization-positive YAC pools can, indeed, be distinguished from hybridization-negative YAC pools, thus allowing the efficient identification of YAC clones. * Human YACs were isolated from a library constructed using a monochromosomal 19 hybrid cell line. The YACs vary in size between 120 and 350 kb. One of the analyzed YACs carries sequences from the telomere region of chromosome 19, and another maps to the centromere region of chromosome 19 by FISH. * A second-generation suite of robust, reliable computer programs was completed for signal preparation and analysis of chromosome 19 restriction fragment fingerprints. These programs implement methods for random noise suppression, background subtraction, and color decorrelation. A new program (TIMEWARP) was also completed to map peak locations in a gel to a common coordinate system by dynamic programming and shape-preserving spline interpolation. * The Sybase database has been enhanced to contain all the laboratory notebook and experimental data important to physical map construction. This includes clone repository information, restriction fragment fingerprinting, and data on probe hybridization and FISH. The database is coupled to the graphical browser so the end user can retrieve many of the experimental results in graphical form. * The graphical database browser was enhanced to run Human Genome Project data remotely over Internet. The browser's ability to link to multiple databases at external collaborator sites has been demonstrated. * In a collaborative effort, automatic transnetwork methods for transferring physical mapping results to the central Genome Data Base (GDB) at Johns Hopkins were built, tested, and implemented by GDB and LLNL. This work was in support of DOE concerns that all laboratories should effect mechanisms to ensure that data are made available to the appropriate public databases after a suitable time period. Prototype methods were implemented, tested, and publicly demonstrated for logically linking our database with the major sequence and mapping databases (GenBankr and GDB). Direct transnetwork queries that logically integrate these data sets are now feasible. * As part of NLGLP, high-speed flow sorting was used to purify individual human chromosomes for cloning. Large-insert phage and cosmid libraries have been made for chromosomes 9, 12, 18, 19, 21, 22, and Y. Several libraries have been distributed to users and evaluation sites. In addition, the high-speed sorter has been rebuilt with new fluidics to optimize sterility and with new electronics to increase the purity of the sorted material. * Construction of a new high-speed chromosome sorter was completed. This instrument has new digital acquisition electronics, a new fluidic system, and a more stable sample stream. The instrument analyzes chromosomes at the rate of up to 20,000/s and can reliably produce 250 to 1000 ng of sorted chromosome DNA equivalents per day. * Using scanning tunneling microscopy (STM), individual images of the bases adenine and thymine were obtained at atomic resolution, indicating that a scanning-probe microscopy technique can discriminate between purines and pyrimidines. * Several technologies have been transferred to industry. They include software for analysis and graphical display of physical map data, sequence information for the commercialization of Alu-PCR primers, and vectors for the construction of cosmid libraries. In addition, collaborative research programs with industry have continued in the areas of fluorescence-based restriction fragment analysis, development of pulsed-field gel systems, development and testing of automated and high-throughput plasmid/cosmid DNA extraction, and development and testing of a robot for high-density colony replication on filters. Future Plans The LLNL genome center's first priority is to complete, to the extent possible, an ordered clone map of chromosome 19; this physical map will likely be a composite linear array of cosmid, lambda, and YAC clones. It will be correlated with the genetic map to assist the scientific community in localizing and isolating all genes from chromosome 19. State-of-the-art technology will be used to sequence selected high-interest regions of the chromosome. Once the technology has been validated for map construction of a large portion of chromosome 19, efforts will be directed to chromosome 2. When Human Genome Project emphasis shifts from mapping to sequencing, exploration will turn to rapid automated DNA sequencing methods that can use large fragments such as cosmids or YACs as templates. STM and X-ray imaging technologies under development at LLNL are expected to contribute to advancements in sequencing. Automation is an essential element of physical mapping. New processes and instruments will be explored to reduce the need for human intervention in highly repetitive tasks. A number of instruments for clone manipulation and biochemical processes will be considered for automation. An effort to map and sequence the cDNAs expressed in a variety of human tissues has recently been initiated. These cDNAs will be used to generate STSs and will serve as the foundation for future studies of gene organization and gene function. Assisting the scientific community in completing ordered clone maps is critical and will remain a high priority. LLNL intends to serve as a resource laboratory for clones and for map information on chromosomes of interest. Ultimately, map and sequence information will be used to study the global architecture of the chromosome and also to evaluate human somatic and genetic variation, both spontaneous and induced. For more information on the LLNL Human Genome Center, contact Anthony Carrano, Director, at 510/422-5698 or Leilani Corell, Administrator, at 510/423-3841. Los Alamos National Laboratory The Center for Human Genome Studies at Los Alamos National Laboratory (LANL) provides direction, coordination, and technical oversight for the LANL portion of the DOE Human Genome Program. The center draws scientific talent from six technical divisions at LANL. Molecular biologists, chemists, physicists, mathematicians, computer scientists, and engineers are contributing to progress in physical mapping, technology development, and informatics. Although a specific goal is the assembly of a complete physical map for human chromosome 16, much of the work is broadly supportive of the worldwide Human Genome Project. Collaborative research and development programs have also been initiated with private-sector and other institutions involved in human genome research. The major technical subdivisions of the center are physical mapping, technology development, and informatics. Activities are also under way at the center to explore ethical, legal, and social issues arising from genome research data and to transfer technology developed within the center's projects. Physical Mapping Physical mapping includes the development of conceptual advances in mapping strategy and the construction of a physical map of chromosome 16. The physical map will be composed of phage, cosmid, and YAC contigs ordered by repetitive sequence fingerprinting. These ordered contigs will be integrated with the genetic linkage map, the cytogenetic map, and known gene sequences on chromosome 16. The final map, along with its eventual translation into a sequence tagged site (STS) map, will provide the means for rapid access to any region of the chromosome for further analysis. In addition, the ordered clone sets will be available for eventual sequencing. Technology Development Technology development efforts include the application of robotics to the handling and storage of DNA fragments, the development and application o f methods for the construction of DNA libraries from flow-sorted chromosomes, and the development of new methods for rapid, inexpensive, large-scale sequencing. All these projects are or will be supportive of the physical mapping of chromosome 16, and they also contribute to the larger genome program. For example, the construction and distribution of various kinds of libraries from sorted chromosomes is playing a significant role at many of the genome research centers. Informatics Informatics efforts involving the collection and analysis of genome-related data will play an increasingly important role in the genome project. LANL has a long history of expertise in this research area and will continue to lead in providing these essential resources. Ethical, Legal, and Social Issues (ELSI) Activities The center also sponsors active participation in ELSI studies related to data produced by human genome research and is compiling a comprehensive literature bibliography in collaboration with Georgetown University. LANL scientists participated in a series of discussions on ELSI issues sponsored by the University of California Humanities Research Institute. Technology Transfer LANL will continue to put a high priority on collaborations with private industry to use the skills and resources of the private sector and to ensure effective technology transfer to the U.S. commercial sector. The first Cooperative Research and Development Agreement (CRADA) involving human genome research activity was signed in 1991 by LANL and Life Technologies, Inc. (LTI). Recent Progress and Future Directions Construction of a physical map of chromosome 16. The chromosome-mapping strategy at LANL involves the rapid generation of cosmid contigs representing around 60% of the target chromosome, followed by directed gap closure with yeast artificial chromosomes (YACs). The first phase of this goal, the rapid generation of nucleation contigs on chromosome 16, has been completed [Stallings et al., Proc. Natl. Acad. Sci. USA 87: 6218-22 (1990)]. An approach for identifying overlapping cosmid clones by exploiting the high density of repetitive sequences in human DNA was used to generate 553 contigs following the fingerprinting of over 4500 individual cosmid clones. These contigs represent more than 80% of the euchromatic arms of chromosome 16 and were constructed with about one-fourth as many cosmid fingerprints as random strategies requiring 50% minimum overlap detection. Nucleating at specific regions allows (a) the rapid generation of large (>100 kb ) contigs in the early stages of contig mapping and (b) the production of a contig map with useful landmarks [i.e., (GT)n repeats] for rapid integration of the genetic and physical maps. All 4500 fingerprinted cosmids in contigs and singlets have been rearrayed on high-density filters. Such filters already provide investigators with access to more than 90% of chromosome 16, with a 60% probability that any region is already present in a contig. These high-density chromosome-specific cosmid filter arrays have also proved useful for YAC fingerprinting with repetitive sequence polymerase chain reaction (PCR) techniques. In collaboration with the laboratories of David Ward (Yale University) and David Callen (Adelaide Children's Hospital, Australia), 130 of these arrayed cosmids have been regionally localized via in situ hybridization or somatic cell hybrid panels. The average gap (containing only singlets), approximately 65 kb in length, can be easily closed with YACs. A single walk from each end of current contigs should, statistically, reduce the number of contigs to approximately 50, one of the 5-year goals of the Human Genome Project (i.e., 1- to 2-Mb contigs; >95% coverage). To facilitate closure, LANL investigators are constructing from monochromosomal hybrids and flow-sorted material both a total genomic YAC library (from cell line GM130, using the vectors pJS97 and pJS98; currently onefold representation) and chromosome 16 YAC clones. One hundred STS markers are being generated to key contigs. Extensive analyses of the DNA sequences obtained from contig ends are in progress using multiple approaches to identify potential coding regions. These approaches include nucleotide and translated amino acid sequence homology searches against GenBank, using BLAST and FASTA, and the new adaptive network program, GRAIL, developed and made available by the Oak Ridge National Laboratory. Current progress with YAC closure indicates that the complete physical map of chromosome 16 will be achieved in the next few years. Low-abundance repetitive DNA sequences identified on chromosome 16. Chromosome 16-specific, low-abundance repetitive DNA sequences (designated CH16LARs) have been identified during construction of the cosmid contig map of this chromosome. CH16LARs were initially identified by in situ hybridization of cosmid and YAC clones to normal human chromosomes (in collaboration with David Ward). The cosmid clones all came from contig 55. The hybridization signals were unusually intense and occurred on four regions of human chromosome 16: bands p13, p12, p11, and q22. Contig 55 contains more clones than any other contig (78 clones or 2% of all clones fingerprinted thus far). Ordering clones within contig 55 is not possible because the presence of these low-abundance repetitive DNA sequences has generated false overlaps. The regions containing CH16LARs may cover as much as 5% of the euchromatic arms of chromosome 16 (~5 Mb of DNA). One CH16LAR sequence (CH16LAR1) was cloned and sequenced, and a minisatellite type of repetitive sequence was identified. The region containing CH16LARs is of biological interest since the pericentric inversion breakpoints commonly found in myelomonocytic leukemia fall within these regions [Mitelman, Hereditas 104: 113 (1986)]. Alternative strategies for mapping and ordering clones from this region are being implemented. Construction and distribution of DNA libraries from flow-sorted chromosomes: National Laboratory Gene Library Project (NLGLP). NLGLP is a cooperative project between LANL and Lawrence Livermore National Laboratory. Investigators at LANL have cloned a set of complete digest libraries into the EcoR I insertion site of Charon 21A; they are available from the American Type Culture Collection, Rockville, Maryland. Sets of partial digest libraries in the cosmid vector sCos1 and in the phage vector Charon 40 are being constructed for human chromosomes 4, 5, 6, 8, 10, 11, 13, 14, 15, 16, 17, 20, and X. Individual human chromosomes are first sorted from rodent-human hybrid cell lines until about 1 æg of DNA has been accumulated. The sorted chromosomes are then examined for purity by in situ hybridization, and the DNA is extracted and partially digested with the restriction enzyme Sau 3AI, dephosphorylated, and cloned into vectors. Partial digest libraries have been constructed for chromosomes 4, 5, 6, 8, 11, 13, 16, 17, and X. Purity estimates from sorted chromosomes, flow-karyotype analysis, and plaque or colony hybridization indicate that most of these libraries are 90 to 95% pure. Additional cosmid library constructions and arrays of libraries having five- to tenfold genomic coverage into microtiter plates are in progress. Libraries have been constructed in M13 or bluescript vectors to generate STS markers for selecting chromosome-specific inserts from a genomic YAC library. LANL has also cloned sorted DNA into YAC vectors and expects to construct a series of YAC libraries representing individual chromosomes (see below). A YAC library for human chromosome 21. YACs have been constructed using DNA isolated from aliquots of flow-sorted human chromosome 21. Chromosomes were prepared from the somatic cell hybrid WAV-17, which contains chromosome 21 as the only human chromosome. DNA isolated from sorted chromosomes was restricted with either Cla I or Eag I or both Not I and Nhe I, ligated to YAC vectors pJS97 and pHS98, and transformed into Saccharomyces cerevisiae strain YPH 250. The transformation efficiency of YACs ranged from 600 to 2500 cfu/æg of sorted DNA. About 1200 human YACs with an average size of 200 kb have been identified. The locations of 20 random YACs on chromosome 21 were confirmed by hybridization to somatic cell hybrid mapping panels. Three YACs that hybridize to D21S55 have been identified and are being used to initiate construction of a physical map of the Down's syndrome region of chromosome 21. Sixty YAC clones from the chromosome 21 library were localized on chromosome 21 by in situ hybridization. The results indicate that the library contains inserts that are well distributed along the length of the chromosome and that the frequency of chimeric inserts is low (below 3%). A collaboration between the genome centers at LANL and Lawrence Berkeley Laboratory (LBL) will use the library for comprehensive physical mapping of chromosome 21 . The ability to construct chromosome-specific YAC libraries from sorted chromosomes will facilitate isolation of disease genes and construction of long-range physical maps of complex genomes. LBL is working on chromosome 21 in cooperation with LANL. Chromosome-specific STS libraries. Specific STSs have been systematically generated using flow-sorted chromosomes. DNA from about 200,000 chromosomes was digested with either one or two restriction enzymes (usually BamH I and Hind III) and cloned directly into bacteriophage M13mp18. One-pass sequencing was conducted, either manually or with a Dupont Genesis 2000 automated sequencer. DNA sequences were analyzed for the presence of sequence similarity to common human repetitive sequences, and appropriate PCR oligomers were synthesized. An acceptable STS-PCR assay yielded the appropriately sized product from both the hybrid cell line DNA containing only the human chromosome of interest and the pools of 384 anonymous YAC clones, spiked with 5 ng/ml total human DNA. To date, over 340 kb of anonymous DNA sequence from human chromosomes 5 and 7 have been analyzed. Two hundred STS markers for chromosome 7 have been generated in collaboration with Maynard Olson's laboratory at Washington University [Green et al., Genomics (in press)], and the first 100 STS markers for chromosome 5 are currently being generated in collaboration with John Wasmuth's laboratory at the University of California, Irvine; 50 STSs for chromosome 5 have been regionally localized. The overall efficiency of PCR reactions yielding appropriate products, with the anonymous genomic sequences from flow-sorted chromosomes, has been approximately 75%. GRAIL analyses indicate that approximately 15% of both the chromosome 16 STSs and the randomly selected STSs for chromosomes 5 and 7 contain putative coding regions. Informatics. The Laboratory Notebook database, designed to manage all information necessary for map assembly, has been expanded to include sequences, STS mapping information, and grid hybridization data, as well as clone fingerprints and completed maps. The forms-based interface is being expanded to provide easy access to the new tables. Graphical interfaces and innovative algorithms to aid map assembly have been prototyped and are being refined. Integrated, multilevel maps are increasing in importance. A strong emphasis for the coming year will be to implement the Software for Integrated Genome Map Assembly (SIGMA) system, which was designed to aid in display, assembly, evaluation, and editing of integrated maps. DNA sequencing based upon single-molecule detection in flow cytometry. This project addresses the problem of rapidly sequencing bases in large fragments of DNA. A DNA fragment of about 40 kb will be labeled with base-identifying tags and suspended in the flow stream of a flow cytometer capable of single-molecule detection. The tagged bases will be sequentially cleaved from the single fragment and identified as the liberated tag passes through the laser beam. A sequencing rate of 100 to 1000 bases/s on DNA strands of around 40 kb is projected [Genet. Anal. 8: 1 (1991)]. Accomplishments of this project are as follows: * Signed CRADA with LTI for joint research on DNA sequencing. LTI will offer expertise in nucleic acid chemistry and enzymology, and LANL will specialize in detection technology and DNA handling. LTI will commercialize the technique [for more information, refer to the figure on p. 25 and to Human Genome News, 3(1): 5 (May 1991)]. * Detected several different kinds of single fluorescing molecules with ~85% efficiency and low error rates [Chem. Phys. Lett. 174: 553 (1990)]. * Observed photon bursts simultaneously from rhodamine-6G and Texas Red, using both a doubled Nd/YAG and a synchronously pumped dye laser for excitation and dual-wavelength detection. * Synthesized DNA fragments up to 500 nucleotides long that contain one fluorescent nucleotide and three normal nucleotides. DNA synthesis was observed with rhodamine-dCTP, rhodamine-dATP, rhodamine-dUTP, fluorescein-dATP, and fluorescein-dUTP. This work was a collaboration with LTI. * Digested the fluoresceinated DNAs described above by six different exonucleases: native T4 polymerase, native T7 polymerase, Klenow fragment of Escherichia coli pol I, exo III, E. coli pol III holoenzyme, and snake venom phosphodiesterase. LTI also participated in these investigations. Robotic workcell for DNA filter array construction. A gantry robot-based workcell has been assembled to array small spots of DNA in an interleaved format. Grid densities on these membrane filters can be varied from 576 to 9216 spots per 22 cm2. The robot picks a microtiter plate from a dispenser, scans a barcode label, removes the plate cover, and inserts a 96-pin gridding tool into the plate wells. The tool is then positioned at the appropriate place on the membrane, and the solutions on the pins are transferred as spots. The gridding tool is washed and sterilized, the lid replaced on the microtiter plate, and the plate placed into a receiving stacker. The entire sequence is repeated with new plates until the desired array has been constructed. For more information on the LANL Center for Human Genome Studies, contact Robert K. Moyzis, Director, or Larry Deaven, Deputy Director, at 505/667-3912. Program Management Infrastructure DOE OHER Mission Genetics and radiation biology have been a long-term concern of the DOE Office of Health and Environmental Research (OHER) and DOE forerunners_the Atomic Energy Commission (AEC) and the Energy Research and Development Administration (ERDA). In the United States, the first federal support for genetics research was through AEC. In the early days of nuclear energy development, the focus was on radiation effects and later broadened under ERDA and DOE to include the health implications of all energy technologies and their by-products (see "Enabling Legislation" in box below). Today, an extensive program of OHER-sponsored research on genomic structure, maintenance, damage, and repair continues at the national laboratories and universities. Some major components of OHER genetics research are (1) molecular cloning and characterization of DNA repair genes, (2) improvement of methodologies and resources for quantitating and characterizing mutations, and (3) the focused resource and technology development needed to map and sequence the human genome_the Human Genome Program. Enabling Legislation The Atomic Energy Act of 1946 (P.L. 79-585) provided the initial charter for a comprehensive program of research and development related to the utilization of fissionable and radioactive materials for medical, biological, and health purposes. The Atomic Energy Act of 1954 (P.L. 83-703) further authorized AEC "to conduct research on the biologic effects of ionizing radiation." The Energy Reorganization Act of 1974 (P.L. 93-438) provided that responsibilities of ERDA shall include "engaging in and supporting environmental, biomedical, physical and safety research related to the development of energy resources and utilization technologies." The Federal Nonnuclear Energy Research and Development Act of 1974 (P.L. 93-577) authorized ERDA to conduct a comprehensive nonnuclear energy research, development, and demonstration program to include the environmental and social consequences of the various technologies. The DOE Organization Act of 1977 (P.L. 95-91) instructed the department "to assure incorporation of national environmental protection goals in the formulation and implementation of energy programs; and to advance the goal of restoring, protecting, and enhancing environmental quality, and assuring public health and safety," and to conduct "a comprehensive program of research and development on the environmental effects of energy technology and programs." Human exposure to environmental factors and the body's response to such factors are a major concern. Unavoidable genome-damaging agents in the environment include natural radiation sources, such as the components of sunlight, cosmic rays from space, and radon from the earth. Both inorganic and organic chemicals, some natural to the environment and others generated by human commerce and energy-related processes, put people at risk. Normal biological functions also contribute to the risk of genetic damage when the body's own cells produce potentially damaging molecules in the course of metabolic processes such as defensive actions against microbes, detoxification of harmful environmental substances, and cell proliferation. Even DNA is not completely stable chemically; its normal methylcytosine constituent has a low but measurable rate of spontaneous mutagenic change. Systems that reverse many types of DNA damage have evolved to include a wide range of repair mechanisms within cells of all species. Humans show great diversity in this capacity, with repair-gene deficiencies showing up as sensitivity to DNA damage from low-level radiation and in diseases such as cancer. Some human genes that contribute to DNA repair processes have been characterized, and others await detection and molecular cloning. A goal of the OHER program is to improve the capabilities for diagnosing individual susceptibility to genome damage. The genome program is providing fundamental information about the linear structure of chromosomes and genes, but understanding gene function requires other types of knowledge. Elucidating the three-dimensional (3-D) structure of proteins is crucial in explicating their functions. To advance these studies, several unique facilities for 3-D microstructure research, developed and maintained at DOE laboratories (see box on DOE facilities), are increasingly in demand by molecular biologists. To carry out its national research and development obligations, OHER conducts the following activities: * Sponsors research and development projects at universities, in the private sector, and at DOE national laboratories; * Uses the unique capabilities of multidisciplinary DOE national laboratories for the nation's benefit; * With advice from the scientific community and other sectors of government, considers novel, beneficial initiatives; and * Provides expertise on various governmental working groups. David J. Galas has directed OHER, an office of the DOE Office of Energy Research, since April 1990. He also serves under the White House Office of Science and Technology Policy as Cochair of the Committee on Life Sciences and Health and as Chairman of its Subcommittee on Biotechnology Research. John C. Wooley became OHER Deputy Associate Director in June 1992. The Human Genome Program, conceived as an Initiative within OHER, is administered primarily through the Health Effects and Life Science Research Division, directed by David A. Smith. The Medical Applications and Biophysical Research Division, directed by Robert W. Wood, monitors the instrumentation sector of the Human Genome Program and, more broadly, sponsors research and development of resources and instrumentation having biomedical and biotechnological applications. Major DOE Facilities and Resources Relevant to Molecular Biology Research Center for X-Ray Optics LBL GenBankr Data Sequence Repository LANL High Flux Beam Reactor BNL Los Alamos Neutron Scattering Center LANL National Flow Cytometry Resource LANL National Laboratory Gene Library Project LANL, LLNL Protein Structure Data Bank BNL National Synchrotron Light Source BNL Scanning Transmission Electron Microscope Resource BNL Stanford Synchrotron Radiation Laboratory Stanford GRAIL, Online Sequence Interpretation Service ORNL Program Management Task Group The Human Genome Program Management Task Group (see box for list of members) reports to the OHER Director and works to coordinate the following within OHER: * peer review of research proposals, using both prospective and retrospective evaluations and * administration of awards, collaboration with all concerned agencies and organizations, organization of periodic workshops, and responses to the needs of the developing program. DOE Human Genome Program Management Task Group in 1992 David A. Smith, Chair Molecular biologist Ann M. Barber Computational biologist Benjamin J. Barnhart Geneticist Daniel W. Drell Biologist Gerald Goldstein Physical scientist Murray Schulman Radiation biologist Jay Snoddy* Molecular biologist Marvin Stodolsky Molecular biologist John C. Wooley Biophysicist *On detail from Argonne National Laboratory. Field Coordination Human Genome Coordinating Committee (HGCC) Another component of the OHER management structure, HGCC was formed in October 1988 to represent DOE genome program researchers along with observers from other government and private agencies (see box for list of HGCC members). Members of the Human Genome Program Management Task Group are ex-officio members of HGCC, and they participate in the regularly scheduled HGCC meetings. HGCC responsibilities include the following: * assisting OHER with overall coordination of DOE-funded genome research; * facilitating the development and dissemination of novel genome technologies; * ensuring proper management and sharing of data and samples; * participating with other national and international efforts; and * recommending establishment of ad hoc task groups to analyze specific areas, such as ethical, legal, and social issues; informatics requirements; mapping and sequencing technologies; use of the mouse as a model organism; cost of resource distribution; and use of chromosome flow-sorting facilities. Human Genome Coordinating Committee Members in 1992 Elbert W. Branscomb, Computational Biologist, Human Genome Center, Lawrence Livermore National Laboratory Charles R. Cantor, Principal Scientist, DOE Human Genome Program, Lawrence Berkeley Laboratory Anthony V. Carrano, Director, Human Genome Center and Leader, Biomedical Sciences Division, Lawrence Livermore National Laboratory C. Thomas Caskey, Director, Institute for Molecular Genetics, Baylor College of Medicine David J. Galas, Office of Health and Environmental Research, DOE Raymond F. Gesteland, Professor and Cochair, Department of Human Genetics, University of Utah; Investigator, Howard Hughes Medical Institute Laboratory for Genetic Studies at the Eccles Institute, University of Utah Leroy E. Hood, Director, Center for Integrated Protein and Nucleic Acid Chemistry and Biological Computation; Director, Cancer Center, California Institute of Technology Robert K. Moyzis, Director, Center for Human Genome Studies, Los Alamos National Laboratory Jasper Rine, Director, Human Genome Center, Lawrence Berkeley Laboratory Robert J. Robbins, Director, Welch Medical Library for Applied Research in Academic Information, Johns Hopkins University David A. Smith, Office of Health and Environmental Research, DOE Lloyd M. Smith, Assistant Professor, Analytical Division, Department of Chemistry, University of Wisconsin, Madison John C. Wooley, Office of Health and Environmental Research, DOE ______________ HGCC Executive Officer: Sylvia J. Spengler, Deputy Director Human Genome Center, Lawrence Berkeley Laboratory A Principal Scientist is a member of HGCC, reports to the Human Genome Program Task Group regarding the responsibility of keeping the program at the leading edge of genome research, and conveys recommendations on broad scientific policies to HGCC. Currently serving as a Principal Scientist is Charles R. Cantor, Lawrence Berkeley Laboratory. Human Genome Management Information System (HGMIS) As an aid to the DOE Human Genome Program Task Group, communication and information services are provided by HGMIS at Oak Ridge National Laboratory. In this role HGMIS facilitates international communication among management and research personnel and informs other interested persons about genome research. HGMIS publications, such as the bimonthly newsletter Human Genome News and technical and program reports, are available to anyone interested in the genome project. Human Genome News is jointly supported by OHER and the NIH National Center for Human Genome Research (NCHGR). Subscribers to the newsletter number over 13,000 and include genome and basic researchers at national laboratories, universities, and other research institutions; professors and teachers; industry representatives; legal personnel; ethicists; students; genetic counselors; physicians; the press; and other interested individuals. In the first quarter of 1992, over 5000 Genome Data Base users were added to the mailing list. Subscribers outside the United States include more than 3000 individuals and institutions in 48 countries. Human Genome Distinguished Postdoctoral Fellowships In 1990 OHER established the Human Genome Distinguished Postdoctoral Research Program to support research on projects related to the DOE Human Genome Program. The postdoctoral program developed from a 1988 recommendation of the DOE Energy Research Advisory Board to "increase support through expansion of the targeted (science and engineering) graduate and postgraduate research fellowship programs with emphasis given to energy-related areas of greatest projected human resource shortages." Recipients of the first fellowships, awarded in FY 1991, are listed below. 1991 DOE Human Genome Distinguished Postdoctoral Fellows* Xiaohua Huang (Stanford University, Biophysical Chemistry) Host: University of California, Berkeley Ben Koop (Wayne State University, Molecular Biology and Genetics) Host: California Institute of Technology Carol Soderlund (New Mexico State University, Computer Science) Host: Los Alamos National Laboratory Harold Swerdlow (University of Utah, Bioengineering) Host: University of Utah *Contact: Linda Holmes: 615/576-3192, Fax: 615/576-0202. Fellowship appointments are tenable at DOE and university laboratories having substantial DOE-sponsored research projects supportive of the Human Genome Program. Fellows will participate in advanced genetics-related research, interact with outstanding professionals, and become familiar with major issues while making personal contributions to the program's goal of mapping and sequencing the human genome. This interaction, involving the exchange of ideas, skills, and technologies, will benefit the fellow, the host laboratory, and the DOE program. These fellowships complement the Alexander Hollaender Distinguished Postdoctoral Fellowships initiated by OHER. The Hollaender Fellowships, established in memory of the 1983 recipient of the prestigious DOE Enrico Fermi Award, provide support in all areas of OHER-sponsored research. Both postdoctoral programs are administered by Oak Ridge Associated Universities, which is a university consortium and DOE contractor. Resource Allocation Reports by the Health and Environmental Research Advisory Committee (HERAC) and the National Research Council (NRC) recommended that national funding for the Human Genome Project increase to a sustaining yearly level of $200 million. DOE program expenditures were $5.5 million in FY 1987, $10.7 million in FY 1988, $17.5 million in FY 1989, $25.9 million in FY 1990, $46 million in FY 1991, and $59 million in FY 1992. The proposed presidential budget for the DOE Human Genome Program in FY 1993 is $64.7 million (graph). DOE-sponsored research is conducted in a variety of institutions (upper table). The lower table categorizes research expenditures for FY 1992. Types of Institutions Conducting DOE-Sponsored Genome Research 8 National laboratories 3 Other federal organizations 41 Academic institutions 10 Private-sector institutions 12 Nonacademic, commercial organizations Human Genome Program Funds Distribution in FY 1992 (in $K) (Commitments as of May 1, 1992) ---------------------------------------------------------------------------- | Organization Mapping Instrumenta Informa ELSI Totals Percent | | Type & tion tics of | | Sequencing Development 568001 | |--------------------------------------------------------------------------| | DOE Labs 23671 7559 5122 236 36588 64.4 | | | | Academic 5462 3341 4528 736 14067 24.8 | | | | Institutions 2173 0 602 847 3622 6.4 | | (nonprofit) | | | | NIH Labs 680 0 0 0 680 1.2 | | | | Companies 1550 0 314 392 2256 3.9 | | and SBIR2 | | | | All 33536 10900 10566 2211 57213 | | Organizations | | | | [Percent [59.0] [19.2] [18.6] [3.9] [100.7]^3 | | of 56800] | ---------------------------------------------------------------------------- 1 Total allocation of $59 million less capital equipment funds of $2.2 million. 2 Small Business Innovation Research grants. 3 Excess occurs because funding for genome SBIR projects is received from the DOE-wide SBIR program, to which OHER contributes. Interagency Coordination Joint DOE-NIH Activities The NIH Human Genome Program, led by NIH NCHGR, has emphasized the study of disease genes in the construction of complete genetic and physical maps of the genomes of humans and selected model organisms. NIH is also developing new technologies and information systems to manage mapping and sequencing data. In the fall of 1988 DOE and NIH began coordinating their human genome research programs under the Memorandum of Understanding, an outgrowth of the HERAC and NRC reports, "to foster interagency cooperation that will enhance the human genome research capabilities of both agencies." More information on NCHGR-sponsored projects and infrastructure may be obtained by contacting the NCHGR Office of Communications at 301/402-0911. Joint DOE-NIH Subcommittee on the Human Genome in 1992 Cochairs: Paul Berg (PACHG) Stanford University School of Medicine Sheldon Wolff (HERAC) University of California, San Francisco Charles R. Cantor Lawrence Berkeley Laboratory (HGCC) Anthony V. Carrano Lawrence Livermore National Laboratory (HGCC) Joseph L. Goldstein University of Texas Southwestern Medical Center Leroy E. Hood California Institute of Technology Leonard S. Lerman Massachusetts Institute of Technology (HERAC) Victor A. McKusick Johns Hopkins Hospital Robert K. Moyzis Los Alamos National Laboratory (HGCC) Maynard V. Olson Washington University School of Medicine (PACHG) MaryLou Pardue Massachusetts Institute of Technology (HERAC) Mark L. Pearson E. I. du Pont de Nemours & Company (PACHG) Diane C. Smith Xerox Corporation (PACHG) Robert T. Tjian University of California, Berkeley Nancy S. Wexler Columbia University (PACHG) John C. Wooley Office of Health and Environmental Research, DOE Ex Officio Members: David J. Galas Office of Health and Environmental Research, DOE Mark S. Guyer National Center for Human Genome Research, NIH Elke Jordan National Center for Human Genome Research, NIH David A. Smith Office of Health and Environmental Research, DOE Michael Gottesman National Center for Human Genome Research, NIH A national plan, primarily authored by NIH and DOE, for a coordinated multiyear research project was presented to Congress in early 1990. Understanding Our Genetic Inheritance, The U.S. Human Genome Project: The First Five Years (1991-1995) detailed a comprehensive spending plan and optimal strategies for mapping and sequencing the human genome. Referred to as the Five Year Plan, it calls for open biannual meetings of the DOE-NIH Joint Subcommittee on the Human Genome. The joint subcommittee invites reports from experts, including those on national and international genome efforts; medical genetics; and related ethical, legal, and social issues as they pertain to data produced in the project. The subcommittee is made up of members from the NIH Program Advisory Committee on the Human Genome (PACHG) and from the DOE HERAC or the HGCC members appointed by HERAC. The subcommittee reports to its parent committees_PACHG and HERAC. Many workshops and meetings have since been cosponsored by the two agencies (see Appendix B). In addition, the Joint Subcommittee on the Human Genome has established five joint working groups that meet regularly to address specific areas of genome research and make recommendations to the joint subcommittee. The objectives of these five joint working groups, listed below, include establishing research priorities; identifying research, training, and technical needs; and coordinating U.S. research activities with those of other countries. Members of the working groups represent various disciplines. (Membership lists of the working groups are included in Appendix D.) Joint Mapping Working Group. The mapping working group encourages development and use of methodologies to integrate genetic linkage and physical maps, meet project mapping goals, and identify informatics needs associated with map generation and completion. Joint Informatics Task Force (JITF). An ad hoc committee, JITF prepared a comprehensive report on genome information needs and data analysis tools. The report was presented to the DOE-NIH Joint Subcommittee on the Human Genome in January 1992. Joint Sequencing Working Group. The sequencing working group investigates and makes recommendations on research and technology development priorities to enable the sequencing of 3 billion nucleotides of human DNA within 15 years. Joint Working Group on Ethical, Legal, and Social Issues (ELSI). ELSI identifies and addresses the social concerns that may arise as genome technology is developed and genetic data becomes available; stimulates bioethics research; promotes education of professional and lay groups; and collaborates with international groups such as the Human Genome Organization (HUGO), United Nations Educational, Scientific, and Cultural Organization (UNESCO), and the European Community (see next section). Joint Working Group on the Mouse. The mouse working group was established to develop a strategy for efficiently using the mouse to accomplish mapping project goals as outlined in the Five Year Plan. This strategy will take advantage of the extensive genetic map data amassed on the mouse. Because of numerous similarities between mouse and human genomes, these studies are considered essential to understanding human biology and to interpreting more complex data obtained in studies of humans. Other U.S. Genome Research U.S. Department of Agriculture (USDA). USDA has implemented a Plant Genome Research Program to foster and coordinate research on single and multigenic traits related to agricultural, forestry, and environmental concerns. The goal of this 5-year program is to improve plant varieties by locating important genes and markers on chromosomes, determining gene structure, and transferring genes to improve the performance of economically important crops such as corn, wheat, soybeans, and pine. Use of these "molecular breeding" techniques will increase U.S. competitiveness in the world marketplace. National Science Foundation (NSF). NSF coordinates an interagency research effort to map and sequence the small genome of Arabidopsis thaliana, a simple weed that provides an ideal model for studying plant biochemistry, genetics, and physiology. Knowledge of the function of every Arabidopsis gene will be applicable to the understanding and manipulation of higher plants and to genome research in general. These studies are also supported by DOE, NIH, and USDA as part of their own genome initiatives, and the four agencies coordinate their Arabidopsis activities. NSF also has instrumentation, computational, and informatics programs that support genomics research, in addition to individual awards in genetics and molecular biology. Howard Hughes Medical Institute (HHMI). HHMI, a private medical research organization, contributes to the genome effort through its support of biomedical research primarily at university molecular biology and genetics laboratories. In addition, HHMI has cosponsored several genomics conferences and, between 1985 and September 1991, supported the collection and dissemination of genome mapping data through a network of databases. International Coordination Genomic research is being carried out in countries throughout the world. The two international organizations described on the next two pages are working to coordinate and facilitate national efforts. HUGO includes a number of DOE and NIH genome investigators and administrators. HUGO and UNESCO have been informed of dedicated genome programs in the following nations and international agencies: Commonwealth of Independent States (formerly U.S.S.R.), Denmark, European Community, France, Germany, Hungary, Italy, Japan, Netherlands, United Kingdom, and United States. HUGO: Worldwide Genome Research Coordination HUGO, formed by scientists to coordinate worldwide genome mapping and sequencing, now has regional offices in the United States (Bethesda, Maryland) and Europe (London) and a satellite office in Moscow. A Pacific office is under development in Osaka, Japan. HUGO offices were funded initially by several charitable organizations. In 1990 HHMI awarded HUGO a 4-year, $1 million grant to support the HUGO Americas office; in that same year The Wellcome Trust provided a 3-year grant, with the first year's funds amounting to over $400,000, to assist with activities in the European office. The Imperial Cancer Research Fund (U.K.) provides support for the HUGO president's office, and the Osaka office has received private support as well. To support future activities, HUGO directors intend to raise funds from various countries that have active genome research programs. HUGO members are elected; there are over 400 members from 32 countries. The international officers in 1992: Sir Walter Bodmer (United Kingdom), President; Charles R. Cantor (United States), Vice-President; Andrei Mirzabekov (Russia), Vice-President; Kenichi Matsubara (Japan), Vice-President; Bronwen Loder (United Kingdom), Secretary; and Robert Sparkes (United States), Treasurer. Each office operates with its own trustees. The objectives of HUGO include * fostering collaboration to avoid unnecessary competition or duplication of effort and to coordinate human genome research with model organism studies; * coordinating exchanges of relevant data and materials; * educating researchers and the public on the scientific, ethical, social, legal, and commercial implications of the research; and * acting as a clearinghouse for genome-related information, such as relevant conferences, worldwide genome programs and researchers, and database and material availability. A training program may be initiated to encourage the spread of new and promising technologies. HUGO has established expert international ad hoc advisory committees on mapping workshops and databases, informatics, ethics, mouse mapping, and intellectual property and ownership. Single-chromosome workshops are crucial to the success of the Human Genome Project. Working with the funding agencies, HUGO is playing a central role in the coordinated development of such meetings and has assisted in planning workshops for chromosomes 2, 3, 13, 16, 19, and X in 1992. HUGO expects to work with the scientific community to select workshop chairs and to assist in fundraising and organizing and running these and future meetings. Chromosome workshops and other meetings are listed in Appendix B. UNESCO: Promoting the Interests of Developing Countries A UNESCO Human Genome Program was approved for 1990-91 at the 25th session of the UNESCO General Conference. Attendees concluded that full knowledge of the human genome is vitally important and that UNESCO could be influential in stimulating governments and agencies to support coordinated programs. UNESCO expects to play a key role in promoting the interests of developing countries. The Scientific Coordinating Committee (SCC), composed of 13 scientists, plans and implements the program, which was budgeted at $350,000 for the first year; SCC members include representatives selected from geographic regions and from international genome organizations such as HUGO. Members of SCC and of the UNESCO Secretariat agreed that UNESCO will concentrate its activities on access to and use of data obtained from human genome mapping and sequencing research, as well as on related ethical and social issues. UNESCO emphasizes the use of training programs as one of the best means of obtaining cooperation and diminishing the gap between developed and developing countries. The Third World Academy of Sciences (TWAS) joined UNESCO in sponsoring a training program that provided 19 fellowships in 1991 to awardees from Algeria, Argentina, Cameroon, Chile, China, Costa Rica, Cyprus, Czechoslovakia, Egypt, Guinea, India, Indones