Title: Improved Analysis of DNA Short Tandem Repeats With Time-of- Flight Mass Spectrometry Series: Science and Technology Research Report Author: John M. Butler and Christopher H. Becker Published: National Institute of Justice, October 2001 Subject: Technology in law enforcement 54 pages 133,000 bytes --------------------------- Figures, charts, forms, and tables are not included in this ASCII plain-text file. To view this document in its entirety, download the Adobe Acrobat graphic file available from this Web site or order a print copy from NCJRS at 800-851-3420 (877-712-9279 for TTY users). --------------------------- U.S. Department of Justice Office of Justice Programs National Institute of Justice Improved Analysis of DNA Short Tandem Repeats With Time-of-Flight Mass Spectrometry science and technology research report --------------------------- U.S. Department of Justice Office of Justice Programs 810 Seventh Street N.W. Washington, DC 20531 John Ashcroft Attorney General Office of Justice Programs World Wide Web Site http://www.ojp.usdoj.gov National Institute of Justice World Wide Web Site http://www.ojp.usdoj.gov/nij --------------------------- Improved Analysis of DNA Short Tandem Repeats With Time-of-Flight Mass Spectrometry John M. Butler and Christopher H. Becker Science and Technology Research Report October 2001 NCJ 188292 --------------------------- NIJ Sarah V. Hart Director, National Institute of Justice Lois Tully Project Monitor John M. Butler, Ph.D., is currently a research chemist at the National Institute of Standards and Technology and principle investigator on an NIJ-funded project to further develop multiplex PCR and time-of-flight mass spectrometry for future forensic DNA typing assays. He was the first to demonstrate that short tandem repeat typing could be performed with capillary electrophoresis. Christopher H. Becker, Ph.D., is currently senior director of proteomics technology at Thermo Finnigan in San Jose, California. During the span of this project, he was president and chief operations officer of GeneTrace Systems, Inc. This project was supported under grant number 97-LB-VX-0003 from the National Institute of Justice, Office of Justice Programs, U.S. Department of Justice. Points of view in this document are those of the authors and do not necessarily represent the official position or policies of the U.S. Department of Justice. This document is not intended to create, does not create, and may not be relied upon to create any rights, substantive or procedural, enforceable at law by any party in any matter, civil or criminal. For further information, contact John M. Butler, National Institute of Standards and Technology, 100 Bureau Drive, Gaithersburg, MD 20899; phone 301-975-4049; e-mail john.butler@nist.gov. The National Institute of Justice is a component of the Office of Justice Programs, which also includes the Bureau of Justice Assistance, the Bureau of Justice Statistics, the Office of Juvenile Justice and Delinquency Prevention, and the Office for Victims of Crime. --------------------------- Acknowledgments The project described in this report could not have happened without the hard work and support of a number of people at GeneTrace Systems, Inc. First and foremost, Jia Li did some of the early primer design and STR work to demonstrate that STRs could be effectively analyzed by mass spectrometry. Jia taught us a lot about PCR and was always encouraging of our work. Likewise, Tom Shaler was important in the early phases of this research with his expert advice in mass spectrometry and data processing. The first GeneTrace STR mass spectra were carefully collected by Tom, and thus he and Jia deserve credit for helping obtain the funding for this study. Dan Pollart synthesized numerous cleavable primers for this project, especially in the first year of our work. David Joo and Wendy Lam also prepared PCR and SNP primers for the later part of this work. A number of people assisted in robotic sample preparation and sample cleanup, including Mike Abbott, Jon Marlowe, David Wexler, and Rebecca Turincio. Joanna Hunter, Vera Delgado, and Can Nhan ran many of the STR samples on the automated mass spectrometers. Their hard work made it possible to focus on experimental design and data analysis rather than routine sample handling. It was a great blessing to have talented and supportive coworkers throughout the course of this project. Kathy Stephens, Jia Li, Tom Shaler, Yuping Tan, Christine Loehrlein, Joanna Hunter, Hua Lin, Gordy Haupt, and Nathan Hunt provided useful discussions on a number of issues and helped develop assay parameters and tackle automation issues, among other things. Nathan Hunt was especially important to the success of this project because he developed the STR genotyping algorithm and CallSSR software as well as the multiplex SNP primer design software. Kevin Coopman developed the SNP genotyping algorithm and calling software and was always eager to analyze our multiplex SNP samples. Joe Monforte and Roger Walker served as our supervisors for the first year and second year of this project, respectively, which allowed us the opportunity to devote sufficient time to doing the work described in this project. Last but not least, Debbie Krantz served as an able administrator of these two NIJ grants and took care of the financial aspects. We also were supported with samples and sequence information from a number of scientific collaborators. Steve Lee and John Tonkyn from the California Department of Justice DNA Laboratory provided genomic DNA samples and STR allelic ladders. Debang Liu from Northwestern University provided the D3S1358 DNA sequence used for improved primer design purposes. Peter Oefner and Peter Underhill from the Department of Genetics at Stanford University provided male population samples and Y-chromosome SNP sequences. The encouragement and support of Lisa Forman and Richard Rau from the Office of Justice Programs at the National Institute of Justice propelled this work from an idea to a working product. In addition, Dennis Reeder from the National Institute of Standards and Technology was always a constant source of encouragement at scientific meetings. --------------------------- Contents Acknowledgments Executive Summary o Introduction o Purpose of the Report o Short Tandem Repeats o Single Nucleotide Polymorphisms o Conclusions and Implications Project Description o STR Grant o SNP Grant Scope and Methodology o Assay Development and Primer Testing o Sample Cleanup and Mass Spectrometry o Sample Genotyping o Comparison Tests With ABI 310 Genetic Analyzer Results and Discussion of STR Analysis by Mass Spectrometry o Marker Selection and Feasibility Studies With STR Loci o Multiplex STR Work o Comparison Tests Between ABI 310 and Mass Spectrometry Results o PCR Issues o Analytical Capabilities of This Mass Spectrometry Method Results and Discussion of Multiplex SNPs o Mitochondrial DNA Work o Y-Chromosome Work References Published Papers and Presentations --------------------------- Executive Summary Introduction The advent of DNA typing and its use for human identity testing has revolutionized law enforcement investigations in recent years by allowing forensic laboratories to match suspects with minuscule amounts of biological evidence from a crime scene. Equally important is the use of DNA to exclude suspects who were not involved in a crime or to identify human remains in an accident. The past decade has seen numerous advances in the DNA testing procedures, most notably among them the development of PCR (polymerase chain reaction)-based DNA typing methods. Technologies for measuring DNA variations, both length and sequence polymorphisms, have also advanced rapidly in the past decade. The time needed to determine a sample's DNA profile has dropped from 6-8 weeks to 1-2 days, and with more recent advancements, the time needed to process samples may decrease to as little as a few hours, maybe even a few minutes. Simultaneous with the evolution of DNA markers and technologies embraced by the forensic community has been the acceptance and use of DNA typing information. The courtroom battles over statistical issues that were common in the late 1980s and early 1990s have subsided as DNA evidence has become more widely accepted. In the past 5 years, DNA databases have emerged as powerful tools for criminal investigations, much like the fingerprint databases that have been used routinely for decades. The United Kingdom launched a nationwide DNA database in 1995 that now contains more than 1 million DNA profiles of convicted felons-- profiles that have been used to aid more than 75,000 criminal investigations. National DNA databases are springing up in countries all over the world as their value to law enforcement is being recognized. In the United States, the FBI has developed the Combined DNA Index System (CODIS) with the anticipation that several million DNA profiles will be entered into this database in the next decade. All 50 States now have laws requiring DNA typing of convicted offenders, typically for violent crimes such as rape or homicide. While the law enforcement community is gearing up to gather millions of DNA samples from convicted felons, the DNA typing technology needs improvement. Large backlogs of samples exist today due to the high cost of performing the DNA testing and limited capabilities in forensic laboratories. As of the summer of 1999, several States, including California, Virginia, and Florida, had backlogs of more than 50,000 samples. A need exists for more rapid and cost-effective methods for high-throughput DNA analysis to process samples currently being gathered for large criminal DNA databases around the world. At the start of this project in June 1997, commercially available slab gel or capillary electrophoresis instruments could handle only a few dozen samples per day. While larger numbers of samples can be processed by increasing the number of laboratory personnel and instruments, the development of high-throughput DNA processing technologies promises to be more cost effective in the long run, especially for the generation of large DNA databases. GeneTrace Systems, Inc., a small biotechnology company located in Alameda, California, has developed high-throughput DNA analysis capabilities using time-of-flight mass spectrometry coupled with parallel sample preparation on a robotic workstation. The GeneTrace technology allows several thousand samples to be processed daily. DNA samples can be analyzed in seconds, rather than minutes or hours, and with improved accuracy compared with conventional electrophoresis methods. Overall, the mass spectrometry method described in this study is two orders of magnitude faster in sample processing time than conventional techniques. Purpose of the Report This NIJ project was initiated to adapt the GeneTrace technology to human identity DNA markers commonly used by forensic DNA laboratories, specifically short tandem repeat (STR) markers. An extension of the original grant was submitted in December 1997 to fund the development of single nucleotide polymorphism (SNP) markers from mitochondrial DNA and the Y chromosome. Based on the results obtained in this study, the authors believe mass spectrometry can be a useful and effective means for high-throughput DNA analysis, and that it has the capabilities to meet the needs of the forensic DNA community for offender DNA databases. However, due to limited resources and a perceived difficulty to enter the forensic DNA market, GeneTrace made a business decision to not pursue this market. While the STR milestones on the original grant were met, only the initial milestones were achieved on the SNP portion of the NIJ grant because of the premature termination on the part of GeneTrace. GeneTrace Systems, Inc., developed an integrated high-throughput DNA analysis system involving the use of proprietary chemistry, robotic sample manipulation, and time-of-flight mass spectrometry. The purpose of this NIJ project was to apply the GeneTrace technology to improve the analysis of STR markers commonly used in forensic DNA laboratories. Mass spectrometry is a versatile analytical technique that involves the detection of ions and the measurement of their mass-to-charge ratio. Because these ions are separated in a vacuum environment, analysis times can be extremely rapid, often within microseconds. Many advances have been made in the past decade for the analysis of biomolecules such as DNA, proteins, and carbohydrates since the introduction of a new ionization technique known as matrix-assisted laser desorption-ionization (MALDI) and the discovery of new matrixes that effectively ionize DNA without extensive fragmentation. When coupled with time-of-flight mass spectrometry, this method for measuring biomolecules is commonly referred to as MALDI-TOF-MS. A schematic of MALDI-TOF-MS is presented in exhibit 1. Short Tandem Repeats Short tandem repeat (STR) DNA markers, also referred to as microsatellites or simple sequence repeats (SSRs), consist of tandemly repeated DNA sequences with a core repeat of 2-6 base pairs (bp). STR markers are readily amplified during PCR by using primers that bind in conserved regions of the genome flanking the repeat region. Forensic laboratories prefer tetranucleotide loci (i.e., 4 bp in the repeat) due to the lower amount of "stutter" produced during PCR. (Stutter products are additional peaks that can complicate the interpretation of DNA mixtures by appearing in front of regular allele peaks.) The number of repeats can vary from 3 or 4 repeats to more than 50 repeats with extremely polymorphic markers. The number of repeats, and hence the size of the PCR product, may vary among samples in a population making STR markers useful in identity testing or genetic mapping studies. Shortly after this project was initiated, the FBI designated 13 core STR loci for the nationwide CODIS database. These STR loci are TH01, TPOX, CSF1PO, VWA, FGA, D3S1358, D5S818, D7S820, D13S317, D16S539, D8S1179, D18S51, and D21S11. The sex-typing marker, amelogenin, is also included in STR multiplexes that cover the 13 core STR loci. Each sample must have these 14 markers tested to be entered into CODIS. To illustrate the kinds of numbers involved to analyze the current national sample backlog of about 500,000 samples, more than 7 million genotypes must be generated. Using currently available technologies, an estimated $25 million (about $50/sample) and more than 5 years for well-trained and well-funded laboratories would be required to determine those 7 million genotypes. With the high cost and effort required, most of these backlogged samples are being stored in anticipation of future analysis and inclusion in CODIS, pending the development of new, faster technology or the implementation of more instruments using the current electrophoresis technologies. Time-of-flight mass spectrometry has the potential to bring DNA sample processing to a new level in terms of high-throughput analysis. However, there are several challenges to using MALDI-TOF-MS for the analysis of PCR products, such as STR markers. Mass spectrometry resolution and sensitivity are diminished when either the DNA size or the salt content of the sample is too large. By redesigning the PCR primers to bind close to the repeat region, the STR allele sizes are reduced to benefit the resolution and sensitivity of the PCR products. Therefore, much of this project involved designing and testing new PCR primers that produced smaller amplicon sizes for STR markers of forensic interest. This research focused on STR loci that have been developed by commercial manufacturers and studied extensively by forensic scientists. These include all of the GenePrintTM tetranucleotide STR systems from Promega Corporation (Madison, WI) as well as the 13 CODIS STR loci that are covered by the Profiler Plus[TM] and COfiler[TM] kits from Applied Biosystems (ABI) (Foster City, CA) (exhibit 2). Where possible, primers were designed to produce amplicons less than 100 bp in size, although it has been possible to resolve neighboring STR alleles as large as 140 bp. For example, TPOX alleles 6-14 ranged from 69-101 bp in size with GeneTrace-designed primers; while with Promega's GenePrint[TM] primers, the same TPOX alleles ranged in size from 224-256 bp. Unfortunately, due to the long and complex repeat structures of several STR markers, this study was unable to obtain the necessary single-base resolution with the following STR loci: D21S11, D18S51, and FGA (see Results and Discussion of STR Analysis by Mass Spectrometry). To verify the STR results obtained from the mass spectrometry method, the authors collaborated with the California Department of Justice (CDOJ) DNA Laboratory in Berkeley to generate a large data set. CDOJ provided 88 samples that had been previously genotyped using validated fluorescent multiplex STR kits from ABI. GeneTrace generated STR results using their primer sets for 9 STR loci (TH01, TPOX, CSF1PO, D3S1358, D16S539, D8S1179, FGA, DYS391, and D7S820) along with the sex-typing marker amelogenin. These experiments compared more than 700 genotypes (88 samples 5 8 loci; data from D8S1179 and DYS391 were not available from CDOJ). Although results were not obtained for all possible samples using mass spectrometry, researchers observed almost 100% correlation with the genotypes obtained between the validated fluorescent STR method and GeneTrace's newly developed mass spectrometry technique, demonstrating that the GeneTrace method was reliable (see Results and Discussion of STR Analysis by Mass Spectrometry). Multiplex STR analysis To reduce analysis cost and sample consumption and to meet the demands of higher sample throughputs, PCR amplification and detection of multiple markers (multiplex STR analysis) has become a standard technique in most forensic DNA laboratories. STR multiplexing is most commonly performed using spectrally distinguishable fluorescent tags and/or nonoverlapping PCR product sizes. An example of an STR multiplex produced from a commercially available kit is shown in exhibit 3. Multiplex STR amplification in one or two PCR reactions with fluorescently labeled primers and measurement with gel or capillary electrophoresis separation and laser-induced fluorescence detection is becoming a standard method among forensic laboratories for analysis of the 13 CODIS STR loci. The STR alleles from these multiplexed PCR products typically range in size from 100-350 bp with commercially available kits. Due to the limited DNA size constraints of mass spectrometry, GeneTrace adopted a different approach to multiplex analysis of multiple STR loci. Primers are designed such that the PCR product size ranges overlap between multiple loci but have alleles that interleave and are resolvable in the mass spectrometer (exhibit 4). As described above, PCR primers are closer to the STR repeat regions than those commonly used with electrophoresis systems. The high accuracy, precision, and resolution of this mass spectrometry approach permits multiplexing STR loci for a limited number of markers. During the study, GeneTrace also developed a TH01-TPOX-CSF1PO STR triplex (exhibit 5). Single Nucleotide Polymorphisms Single nucleotide polymorphisms (SNPs) represent another form of DNA variation that is useful for human identity testing. SNPs are the most frequent form of DNA sequence variation in the human genome and are becoming increasingly popular genetic markers for genome mapping studies and medical diagnostics. SNPs are typically biallelic with two possible nucleotides (nt) or alleles at a particular site in the genome. Because SNPs are less polymorphic (i.e., have fewer alleles) than the currently used STR markers, more SNP markers are required to obtain the same level of discrimination between samples. Current estimates are that 30-50 unlinked SNPs will be required to obtain the matching probabilities of 1 in about 100 billion as seen with the 13 CODIS STRs. The perceived value of SNPs for DNA typing in a forensic setting include the following: o More rapid analysis. o Cheaper costs. o Simpler interpretation of results because there are no stutter products. o Improved ability to handle degraded DNA because of the possibility of smaller PCR product sizes. While it is doubtful that autosomal SNPs will replace the current battery of STRs used in forensic laboratories in the near future, abundant mitochondrial and Y-chromosome SNP markers exist and have already proven useful as screening tools. These maternal (mitochondrial) and paternal (Y chromosome) lineage markers are effective in identifying missing persons and war casualties and helping answer historical questions such as whether or not Thomas Jefferson fathered a slave child. The forensic DNA community already has experience with applying SNP markers as a screening process, which can prove very helpful for excluding suspects from crime scenes. Many crime laboratories still use reverse dot blot technology for analyzing the SNPs from HLA-DQA1 and PolyMarker loci with kits from ABI. In addition, mitochondrial DNA (mtDNA) sequencing is currently performed in some forensic laboratories. In the work performed on multiplex SNP markers at GeneTrace, the authors examined 10 polymorphic sites within the mtDNA control region and 20 Y-chromosome SNPs provided by Dr. Peter Oefner and Dr. Peter Underhill from Stanford University. A multiplex SNP assay was developed for 10 mtDNA SNP sites (exhibit 6). Only limited work was performed on the Y-chromosome SNPs due to the premature termination of the work. However, results demonstrated a male-specific 17-plex PCR of 17 different Y SNP markers (exhibit 7). Conclusions and Implications Time-of-flight mass spectrometry offers a rapid, cost-effective alternative for genotyping large numbers of samples. Each DNA sample can be accurately measured in a few seconds. Due to the increased accuracy of mass spectrometry, STR alleles can be reliably typed without comparison with allelic ladders. Mass spectrometry holds significant promise as a technology for high-throughput DNA processing that will be valuable for large-scale DNA database work. In summary, the positive features of mass spectrometry for STR analysis include: o Rapid results--STR typing at a rate of seconds per sample. o Accuracy--no allelic ladders. o Direct DNA measurement--no fluorescent or radioactive labels. o Automated sample preparation and data collection. o High-throughput capabilities of thousands of samples daily per system. o Flexibility--single nucleotide polymorphism (SNP) assays can be run on the same instrument platform. This project demonstrates that both STR and SNP analysis are reliably performed with GeneTrace's mass spectrometry technology. Tests were done on a large number of human DNA markers of forensic interest. New primer sets were developed for the 13 CODIS STR loci that may prove useful in the future for situations in which degraded DNA is present and requires smaller amplicons to obtain successful results. The possibility of developing multiplexed SNP markers also was explored, and a mtDNA 10-plex assay and Y-chromosome, 17-plex, male-specific PCR were demonstrated. Both STR and SNP areas appear promising for future research. In another project, GeneTrace recently demonstrated a sample throughput of approximately 4,000 STR samples in a single day with a single automated mass spectrometer. Clearly, this is an improvement in the analysis of DNA short tandem repeat markers using time-of-flight mass spectrometry. --------------------------- Project Description STR Grant This project focused on the development of a powerful new technology for rapid and accurate analysis of DNA STR markers using time-of-flight mass spectrometry. GeneTrace Systems, Inc., collaborated with the CDOJ DNA Laboratory in Berkeley, California, primarily through Dr. Steve Lee. This collaboration provided the study with the samples used to verify the new GeneTrace technology, which was done by comparing the mass spectrometry results with genotypes obtained using established and validated methods run at CDOJ. To accomplish the task of developing a new mass spectrometry technology for STR typing, five milestones were proposed in the original grant application, which included the following: o Redesign PCR primers for a number of commonly used STR markers to produce smaller PCR products that could be tested in the mass spectrometer (exhibit 2). o Demonstrate multiplexing capabilities to a level of 2 or 3 for detection with the TOF-MS method (exhibits 4-5 and 8-10). o Transfer the sample preparation protocols from manual to a highly parallel and automated pipetting robot. o Develop a large data set to confirm the accuracy and reliability of this method (exhibits 11-19). o Automate and incorporate DNA extraction techniques onto the GeneTrace robots. As described in the results section, all milestones were met on time except the final one regarding DNA extraction. Two other companies, Rosys and Qiagen, produced robotic systems for DNA extraction after this project began. Meanwhile, GeneTrace remained focused on developing other steps in DNA sample processing since commercially available solutions had already been developed, thereby eliminating the need to include the DNA extraction portion in this study. Since this project began in June 1997, a number of advances that impact the ability to perform high-throughput DNA typing have occurred in the biotech field. In early 1998, ABI released a dual 384-well PE9700 ("Viper") thermal cycler, which makes it possible to prepare 768 PCR samples simultaneously. Beckman Instruments also came to market with a 96-tip Multimek pipetting robot. At the beginning of this project, GeneTrace used funds from this NIJ grant to purchase an MJ Research 384-well thermal cycler and a custom-built 96-tip robotic pipettor on a CyberLab x-y-z gantry. Both of these pieces of equipment were the state of the art at the time but are now obsolete at GeneTrace for routine operations and have been replaced by the newer and more reliable products from Applied Biosystems and Beckman Instruments. SNP Grant The grant extension, which began in August 1998, focused on the development of multiplexed SNP markers from mtDNA and the Y chromosome. Although the grant extension was terminated prematurely by GeneTrace management in April 1999, portions of the first four milestones were accomplished. The five milestones described in the original grant extension included the following: o Produce and test a set of 10 or more SNP probes for mtDNA control region "hot spots" (exhibit 5). o Develop software for multiplex SNP analysis and data interpretation. o Examine individual Y-chromosome SNP markers. o Develop multiplex PCR and multiplex SNP probes for Y-chromosome SNP loci (exhibit 7). o Determine the discriminatory power for a set of Y-chromosome markers by running about 300 samples across 50 Y-chromosome SNP markers. The goal of the grant extension project was to develop highly multiplexed SNP assays that worked in a robust manner with mass spectrometry and could be genotyped in an automated fashion. GeneTrace planned to select markers with a high degree of discrimination to aid in rapid screening of mitochondrial DNA and Y-chromosome polymorphisms with the capability to handle analysis of large databases of offender DNA. At the time this proposal was written (December 1997), GeneTrace still intended to provide reagents and instruments to large DNA service laboratories or to provide a DNA typing service to the forensic DNA community. The grant extension was prematurely terminated due to a change in business focus and a need to consolidate the research efforts at GeneTrace. --------------------------- Scope and Methodology Both the STR and the SNP genotyping assays used in this project involve the same fundamental (proprietary) sample preparation chemistry. This chemistry was important for salt reduction/removal prior to the mass spectrometric analysis and was automated on a 96-tip robotic workstation. A biotinylated, cleavable oligonucleotide was used as a primer in each assay and was incorporated through standard DNA amplification (i.e., PCR) methodologies into the final product, which was measured in the mass spectrometer. This process was covered by U.S. Patent 5,700,642, which was issued in December 1997, and is described in more detail in U.S. Patent 6,090,588 (Butler et al., 2000). The STR assay is schematically illustrated in exhibit 20 and involves a PCR amplification step where one of the primers is replaced by the GeneTrace cleavable primer. The biotinylated PCR product was then captured on streptavidin-coated magnetic beads for post-PCR sample cleanup and salt removal followed by mass spectrometry analysis. The biology portion of the SNP assay, on the other hand, involves a three-step process: (1) PCR amplification, (2) phosphatase removal of nucleotides, and (3) primer extension, using the GeneTrace cleavable primer, with dideoxynucleotides for single-base addition of the nucleotide(s) complementary to the one(s) at the SNP site (Li et al., 1999). The SNP assay is illustrated in exhibit 21. Simultaneous analysis of multiple SNP markers (i.e., multiplexing) is possible by simply putting the cleavage sites at different positions in the various primers so they do not overlap on a mass scale. Also important to both genotyping assays is proprietary calling software that was developed (and evolved) during the course of this work. A number of STR and SNP markers were developed and tested with a variety of human DNA samples as part of this project to demonstrate the feasibility of this mass spectrometry approach. Assay Development and Primer Testing Primer design Primers were initially designed for each STR locus using Gene Runner software (Hastings Software, Inc., Hastings, NY) and then more recent-ly with Primer 3 version 0.2 from the World Wide Web (http://www-genome.wi.mit.edu/cgi-bin/primer/primer3_www.cgi) (Rozen and Skaletsky, 1998). Multiplex PCR primers for the multiplex SNP work were designed with a UNIX version of Primer 3 (release 0.6) adapted at GeneTrace to utilize a mispriming library and Perl scripts for input of sequences and export of primer information. DNA sequence information was obtained from GenBank (http://www. ncbi.nlm.nih.gov) and STRBase (http://www.cstl.nist.gov/biotech/strbase) for the STR loci and mtDNA and from Dr. Peter Underhill of Stanford University for the Y-chromosome SNPs. These sequences served as the reference sequence for primer design and, in the case of STRs, the calibrating mass for the genotyping software (see below). When possible, primers were placed close to the repeat region to make the PCR product size ranges under 120 bp to improve the sensitivity and resolution in the mass spectrometer (exhibit 2). Previously published primers were used in the case of amelogenin (Sullivan et al., 1993), D3S1358 (Li et al., 1993), CD4 (Hammond et al., 1994), and VWA (Fregeau and Fourney, 1993) because their PCR product sizes were analyzable in the mass spectrometer or the amplicons could be reduced in size following the PCR step (see below). Later, D3S1358 experiments were performed with primers that produced smaller products after sequence information became available for that particular STR locus (exhibit 22). Primer synthesis Unmodified primers were purchased from Biosource/Keystone (Foster City, CA) or Operon Technologies (Alameda, CA) or synthesized in-house using standard solid-phase phosphoramidite chemistry. The GeneTrace cleavable primers were synthesized in-house using a proprietary phosphoramidite that was incorporated near the 3' end of the oligonucleotide along with a biotin attached at the 5' end. Primers were quality control tested via mass spectrometry prior to further testing to confirm proper synthesis and to determine the presence or absence of failure products. Synthesis failure products (i.e., n-1, n-2, etc.) can especially interfere with multiplex SNP analysis. The cleavable base is stable during primer synthesis and PCR amplification. Comparisons of regular primers with cleavable primers containing the same base sequence showed no significant difference, indicating that the primer annealing is not compromised by the cleavable base. Methods for STR product size reduction It was discovered early in the study that the PCR primer opposite the biotinylated cleavable primer could be moved into the repeat region as much as two full repeat units to reduce the overall size without severely compromising the PCR reaction. For the cleavable primer, the cleavable base was typically placed in the second or third position from the 3' end of the primer in order to remove as much of the modified primer as possible. Thus, the cleavage step reduces the overall PCR product size by the length of the cleavable primer minus two or three nucleotides. Typically, this size reduction is approximately 20 bases. The portion of the DNA product on the other side of the repeat region from the cleavable primer was removed in one of two possible ways: using a restriction enzyme (Monforte et al., 1999) or performing a nested linear amplification with a terminating nucleotide (Braun et al., 1997a and 1997b), such as dideoxynucleotide (ddN). These methods work only for particular situations (see Results and Discussion). Almost all singleplex STR work was performed without either of these product size reduction methods. However, these size reduction methods played a role in the multiplex STR work. Multiplex design STR multiplexes were designed by construction of virtual allelic ladders or "mass simuplexes" that involved the predicted mass of all known alleles for a particular locus. STR markers were then interleaved based on mass with all alleles between loci being distinguishable (exhibit 4). STR multiplexes work best if alleles are below 20,000-25,000 Daltons (Da) in mass due to the improved sensitivity and resolution that is obtainable in the mass spectrometer. As previously described in the section on size reduction, a restriction enzyme or a ddN terminator may be used to shorten the STR allele sizes. For multiplex design, locating a restriction enzyme with cut sites common to all STR loci involved in the multiplex complicates the design process and limits the choice of possible marker combinations. The use of a common dideoxynucleotide terminator is much easier. For example, with the STR loci CSF1PO, TPOX, and TH01, a multiplex was developed using a dideoxycytosine (ddC) terminator and primer extension along the AATG strand (exhibits 4 and 5). SNP multiplexes were designed by calculating possible postcleavage primer and extension product masses. Multiply charged ions were abundant in the mass range of 1,500-7,000 Da in SNP multiplex analyses, which were avoided for the most part by calculating interfering doubly charged and triply charged ions. The cleavage sites for candidate multiplex SNP primers were chosen for the least amount of overlap between singly and multiply charged ions (exhibits 23). Human DNA samples used Human genomic DNA samples representing several ethnic groups (African-American, European, and Asian) were purchased from Bios Laboratories (New Haven, CT) for the initial studies. K562 cell line DNA (Promega) was used as a control sample since the genotypes for this cell line were reported in most of the STR loci (GenePrint[TM] STR Systems Technical Manual, 1995). Allelic ladders were reamplified from a 1:1000 dilution of each of the allelic ladders supplied in fluorescent STR kits from ABI using the PCR conditions listed below and the primers shown in exhibit 24. The ABI kits included allelic ladders for the following STR loci: AmpF1STR[R] Green I (CSF1PO, TPOX, TH01, amelogenin), AmpF1STR[R] Blue (D3S1358, VWA, FGA), AmpF1STR[R] Green II (amelogenin, D8S1179, D21S11, D18S51), AmpF1STR[R] Yellow (D5S818, D13S317, D7S820), and AmpF1STR[R] COfiler[TM] (amelogenin, TH01, TPOX, CSF1PO, D3S1358, D16S539, D7S820). While most PCR amplifications were performed with quantitated genomic DNA in liquid form, a few were tested with blood-stained FTA[TM] paper (Life Technologies, Rockville, MD). Sample punches were removed from the dried FTA[TM] paper card with a 1.2 mm Harris MICRO- PUNCH[TM] (Life Technologies). The recommended washing protocol of 200 micro-L was reduced to 25 or 50 micro-L in order to reduce reagent costs and to work with volumes that are compatible with 96- or 384-well sample plates. The number of washes was kept the same as recommended by the manufacturer, but deionized water was used instead of a 10 mM Tris-EDTA solution. Two studies were performed with larger numbers of DNA samples. In collaboration with Dr. Steve Lee and Dr. John Tonkyn from CDOJ's DNA research laboratory, CDOJ provided a plate of 88 samples, which was used repeatedly for multiple STR markers. These anonymous samples had been previously genotyped by CDOJ using ABI's AmpF1STR[R] Profiler[TM] kit, which consists of AmpF1STR[R] Blue, Green I, and Yellow markers and amplifies 9 STRs and the sex-typing marker amelogenin. STR allelic ladders were also provided by CDOJ and were used to illustrate that the common alleles for each STR locus could be detected with GeneTrace primer sets. Researchers retyped the samples using the AmpF1STR[R] COfiler[TM] fluorescent STR kit, which contains 6 STRs and amelogenin (5 of the 6 STR loci overlap with Profiler loci), thereby providing a further validation of each sample's true genotype. More recently, a set of 92 human DNA templates containing 3 different Centre d'Etude du Polymorphisme Humain (CEPH) families (exhibit 25) and 44 unrelated individuals from the NIH Polymorphism Discovery Resource were examined (Collins et al., 1998). These samples were typed on the ABI 310 Genetic Analyzer using both the AmpF1STR[R] Profiler Plus[TM] and AmpF1STR[R] COfiler[TM] kits so that all 13 CODIS STRs were covered. PCR reaction To speed the development of new STR markers, researchers worked toward the development of universal PCR conditions, in terms of both thermal cycling parameters and reagents used. Since almost all amplifications were singleplex PCRs, development effort was much simpler than multiplex PCR development. Generally, all PCR reactions were performed in 20 micro-L volumes with 20 pmol (1 micro-M) both forward and reverse primers and a PCR reaction mix containing everything else. The early PCR reaction mix contained 1 U Taq polymerase (Promega); 1X STR buffer with deoxynucleotide triphosphates (dNTPs) (Promega); and typically 5, 10, or 25 ng of human genomic DNA. Later in the study, a PCR mix containing 200 micro-M dNTPs, 50 mM KCl, 10 mM Tris-HCl, 5% glycerol, and 2 mM MgCl2 was used. Typically, a locus-specific master mix was prepared by the addition of 12.8 micro-L of PCR mix times the number of samples (+ about 10% overfill) with 0.2 micro-L AmpliTaq Gold[TM] DNA polymerase (ABI) and the appropriate volume and quantity of forward and reverse primers to bring them to a concentration of 1 micro-M in each reaction. PCR reactions in a 96- or 384-well format were set up manually with an 8-channel pipettor or robotically with a Hamilton 16-tip robot. Thermal cycling was performed in 96- or 384-well MJ Research DNA Engine (MJ Research, Watertown, MA) or 96 or dual block 384 PE9700 (ABI) thermal cyclers. Initial thermal cycling conditions with Taq polymerase (Promega) were as follows: 94 degrees C for 2 min 35 cycles: 94 degrees C for 30 sec 50, 55, or 60 degrees C for 30 sec 72 degrees C for 30 sec 72 degrees C for 5 or 15 min 4 degrees C hold The final incubation at 72 degrees C favors nontemplated nucleotide addition (Clark 1988, Kimpton et al., 1993). This final incubation temperature was dropped to 60 degrees C for some experiments in an effort to drive the nontemplated addition even further. Later experiments, including all of the larger sample sets, involved using the following thermal cycling program with TaqGold DNA polymerase: 95 degrees C for 11 min (to activate the TaqGold DNA polymerase) 40 cycles: 94 degrees C for 30 sec 55 degrees C for 30 sec 72 degrees C for 30 sec 60 degrees C for 15 min 4 degrees C hold Primers were typically designed to have an approximate annealing temperature of 57-63 degrees C and thus worked well with a 55 degrees C anneal step under this "universal" thermal cycling protocol. The need for extensive optimization of primer sets, reaction components, or cycling parameters was greatly reduced or eliminated with this approach for primer development on STR markers. Mitochondrial DNA samples were amplified with 35 cycles and an annealing temperature of 60 degrees C using the PCR primers listed in exhibit 26. Multiplex PCR Multiplex PCR was performed using a universal primer tagging approach (Shuber et al., 1995; Ross et al., 1998b) and the following cycling program: 95 degrees C for 10 min 50 cycles: 94 degrees C for 30 sec 55 degrees C for 30 sec 68 degrees C for 60 sec 72 degrees C for 5 min 4 degrees C hold The PCR master mix contained 5 mM MgCl2, 2 U AmpliTaq Gold with 1X PCR buffer II (ABI), 20 pmol of each universal primer, and 0.2 pmol of each locus-specific primer. The universal primer sequences were 5'-ATTTAGGTGACACTATAGAATAC-3' (attached on 5' end of locus specific forward primers) and 5'-TAATACGACTCACTATAGGGAGAC- 3' (attached on 5' end of locus specific reverse primers). Exhibit 27 shows the primer sequences used for multiplex amplification of up to 18 Y SNP markers. During multiplex PCR development studies, each primer set was tested individually as well as in the multiplex set. Primer sets that were less efficient exhibited a higher amount of remaining primers or primer dimers in CE electropherograms of the PCR products. "Drop-out" experiments, where one or more primers were removed from the multiplex set, were then conducted to see which primer sets interfered with one another (exhibit 28). Finally, primer concentrations were adjusted to try and improve the multiplex PCR product balance between amplicons. Verification of PCR amplification Following PCR, a 1 micro-L aliquot of the PCR product was typically checked on a 2% agarose gel stained with ethidium bromide to verify amplification success. After a set of primers had been tested multiple times and a level of confidence had been gained for amplifying a particular STR locus, the gel PCR confirmation step was no longer used. Later in this project, a Beckman P/ACE 5500 capillary electrophoresis (CE) instrument was used to check samples after PCR. The quantitative capabilities of CE are especially important when optimizing a multiplex PCR reaction. As long as the products are resolvable, their relative peak area or heights can be used to estimate amplification efficiency and balance during the multiplex PCR reaction. The CE separations were all performed using an intercalating dye and sieving polymer solution as previously described (Butler et al., 1995) to avoid having to fluorescently label the PCR products. Samples were prepared for CE analysis by simply diluting a 1 micro-L aliquot of the amplicon in 49 micro-L of deionized water. SNP reaction and phosphatase treatment For SNP samples, the amplicons were treated with shrimp-alkaline phosphatase (SAP) (Amersham Pharmacia Biotech, Inc., Piscataway, NJ) to hydrolyze the unincorporated dNTPs following PCR (Haff and Smirnov, 1997). Typically, 1 U of SAP was added to each 20 micro-L PCR reaction and then incubated at 37 degrees C for 60 minutes followed by heating at 75 degrees C for 15 minutes. The SNP extension reaction consisted of a 5 micro-L aliquot of the SAP-treated PCR product, 1X TaqFS buffer, 1.2-2.4 U TaqFS (ABI), 12.5 micro-M dideoxynucleotide triphosphate (ddNTP) mix, and 0.5 micro-M biotinylated, cleavable SNP primer in a 20 micro-L volume. For multiplex analysis, SNP primer concentrations were balanced empirically, typically in the range of 0.3-1.5 icro-M, and polymerase and ddNTP concentrations were also doubled from the singleplex conditions to facilitate extension from multiple primers. The SNP extension reaction was performed in a thermal cycler with the following conditions: 94 degrees C for 1 min and 25-35 cycles at 94 degrees C for 10 sec, 45-60 degrees C (depending on the annealing temperature of the SNP primer) for 10 sec, and 70 degrees C for 10 sec. An annealing temperature of 52 degrees C was used for the mtDNA 10plex SNP assay. Sample Cleanup and Mass Spectrometry Following PCR amplification, a purification procedure involving solid-phase capture and release from streptavidin-coated magnetic beads was utilized (Monforte et al., 1997) to remove salts that interfere with the MALDI ionization process (Shaler et al., 1996). At the start of this project, most of the sample purification was performed manually in 0.6 mL tubes with a 1.5 mL Dynal MPC[R]-E (Magnetic Particle Concentrator for Microtubes of Eppendorf Type) (Dynal A.S., Oslo, Norway). Larger scale experiments performed toward the end of this project utilized a robotic workstation fitted with a 96-tip pipettor that mimicked the manual method. This sample cleanup method involved washing the DNA with a series of chemical solutions to remove or reduce the high levels of sodium, potassium, and magnesium present from the PCR reaction. The PCR products were then released from the bead with a chemical cleavage step that breaks the covalent bond between the 5'-biotinylated portion of the DNA product and the remainder of the extension product, which contains the STR repeat region or the dideoxynucleotide added during the SNP reaction. In the final step prior to mass spectrometry analysis, samples were evaporated to dryness using a speed vac, reconstituted in 0.5 micro-L of matrix (manual protocol) or 2 micro-L of matrix (robotic protocol), and spotted on the sample plate. The matrix typically used for STR analysis was a 5:1 molar ratio of 3-hydroxypicolinic acid (3-HPA) (Lancaster Synthesis, Inc., Windham, NH) with picolinic acid in 25 mM ammonium citrate (Sigma-Aldrich, St. Louis, MO) and 25% acetonitrile. For SNP analysis, about 0.5 M saturated 3-HPA was used with the same solvent of 25 mM ammonium citrate and 25% acetonitrile. A GeneTrace-designed and built linear time-of-flight mass spectrometer was used as previously described (Wu et al., 1994). Much of the early data were collected manually on a research mass spectrometer. During the time period of this project, GeneTrace also built multiple high-throughput instruments. Automated high-throughput mass spectrometer GeneTrace has designed and custom-built unique, automated time-of-flight mass spectrometers for high-throughput DNA analysis. The basic instrument design is covered under U.S. Patent 5,864,137 (Becker and Young, 1999). A high repetition rate UV laser (e.g., 100 Hz) is used to enable collection of high quality mass spectra consisting of 100-200 summed shots in only a few seconds. The sample chamber can hold up to two sample plates at a time with each plate containing 384 spotted samples. Exhibit 29 shows a sample plate on the X-Y table under the custom GeneTrace ion optics. An important feature of this automated mass spectrometer is "peak picking" software that enables the user to define "good" versus "bad" mass spectra. After each laser pulse, the "peak picker" algorithm checks for peaks above a user-defined signal-to-noise threshold in a user-defined mass range. Only "good" spectra are kept and summed into the final sample spectrum, which improves the overall signal quality. The X-Y table moves in a circular pattern around each sample spot until either the maximum number of good shots (e.g., 200) or the maximum number of total shots (e.g., 1,000) is reached. A mass spectrum's signal-to-noise level is related to the number of laser shots collected. In general, signal-to-noise improves as the square root of the number of shots. Thus, improving the signal by a factor of two would require increasing the number of good shots collected by a factor of four. Raw data files (.dat) were converted to "smoothed" data files (.sat) using custom software developed at Gene-Trace that improved data quality and involved several multipoint Savitzky-Golay averages along with a baseline subtraction algorithm (Carroll and Beavis, 1996). A set of samples was collected under a single "header" file with identical peak picking parameters. Each header file recorded the mass calibration constants and peak picking parameters and listed all of the samples analyzed with the number of good shots collected versus the number of total shots taken for each sample. Data points in mass spectrometry are collected in spectral channels that must be converted from a time value to a mass value. This mass calibration is normally performed with two oligonucleotides that span the mass range being examined. For example, a 36-mer (10,998 Da) and a 55-mer (16,911 Da) were typically used when examining STRs in the size range of 10,000-40,000 Da. On the other hand, a 15-mer (4,507 Da) and its doubly charged ion (2,253.5 Da) were used to cover SNPs in the size range of 1,500-7,500 Da. Ideally, larger mass oligonucleotides would be used for STR analysis to obtain more accurate masses, but producing a clean, well-resolved peak above 25-30 kDa is a synthetic and instrumental challenge. The calibration was typically performed only once per day because the calibration remains consistent over hundreds of samples. The mass accuracy and precision are such that no sizing standards or allelic ladders need to be run to determine a sample's size or genotype (Butler et al., 1998b). Delayed extraction (Vestal et al., 1995) and mass gating ("blanking") were used to improve peak resolution and sensitivity, respectively. Typically, a delay of 500-1,000 nanoseconds was used to eliminate ions below about 8,000 Da for STRs, and a delay of 250-500 nanoseconds was used with a signal blanking below about 1,000 Da for SNPs. Sample Genotyping Automated STR genotyping program (CallSSR or CallSTR) During the time period of this project, GeneTrace developed an automated sample genotyping program, named CallSSR. The data sets described in the Results section were processed either with CallSSR version 1.82 or a modified version of the program named CallSTR. The program was written in C++ at GeneTrace by a scientific programmer named Nathan Hunt and can run on a Windows[R] NT platform. A reference DNA sequence is used to establish the possible STR alleles and their expected masses based on an expected repeat mass and range of alleles. This mass information is recorded in a mass ladder file (exhibit 30). In the case of the forensic STR loci examined in this project, the GenBank sequences were used as the reference DNA sequences. CallSSR accepts as input smoothed, baseline-subtracted data files, a "layout" file, and the mass ladder file. The layout file describes each sample's position on the 384-well plate, the primer set used for PCR (i.e., the STR locus), and the DNA template name. The program processes samples at a rate of more than one sample per second so that a plate of 384 samples can be genotyped in less than 5 minutes. This high rate of processing speed is necessary in a high-throughput environment where thousands of samples must be genotyped every day. Two files result from running the program: a "call file" and a "plot file." The call file may be imported into Microsoft[R] Excel for data examination and contains information like the allele mass and calculated sample genotype. The plot file generates plotting parameters that work with MATLAB (The Math Works, Inc., Natick, MA) scripts to plot 8 mass spectra per page, as seen in exhibit 31. Plots are generated in an artificial repeat space to aid visual inspection of the mass spectrometry data compared with allele bins. The CallSSR algorithm has been written to ignore stutter peaks and double-charged peaks, which are artifacts of the DNA amplification step and mass spectrometry ionization process, respectively. Automated SNP genotyping program (CallSNP) In-house automated SNP analysis software was developed and used to determine the genotype for each SNP marker. This program, dubbed CallSNP, was written in C++ by a scientific programmer named Kevin Coopman and will run on a Windows NT or UNIX platform. The software searches for an expected primer mass and, after locating the pertinent primer, searches for the four possible extension products by using a linear least squares fit, with the primer peak shape serving as the fitting line. In this way, peak adducts from the ionization process are distinguished from true heterozygotes. The fit coefficients of the four possible nucleotides are then compared with one another to determine the appropriate SNP base. The base with the highest value (i.e., best fit) is the called base. The mass between the primer and the extension product can then be correlated to the incorporated nucleotide at the SNP site. In the case of a heterozygote at the SNP site, two extension products exist and are called by the software. As with the CallSSR software, a layout file, a mass file, and mass spectrometry data files are required as input. Call files generate information regarding the closeness of the fit for each possible nucleotide with an error value associated for each call. The SNP mass information file includes the SNP marker name, expected primer mass (postcleavage), and expected SNP bases. The current version of CallSNP works well for singleplex SNPs but needs modifications before it can work effectively on multiplex SNP samples. In principle, the program could be scaled to limited, widely spaced multiplexes where the doubly charged ions of larger mass peaks do not fall in the range of lower mass primer peaks. Comparison Tests With ABI 310 Genetic Analyzer For comparison purposes, more than 200 genomic DNA samples were genotyped using the Applied Biosystems 310 Genetic Analyzer and the AmpF1STR[R] Profiler Plus[TM] or AmpF1STR[R] COfiler[TM] fluorescent STR kit. Exhibit 32 lists the numbers of samples analyzed with each STR kit. STR samples were run in the ABI 310 CE system using the POP-4 polymer, 1X Genetic Analysis buffer, and a 47-cm (50 micro-m i.d.) capillary with the GS STR POP4 (1 mL) F separation module (ABI Prism 310 Genetic Analyzer User's Manual, 1998). With this module, samples were electrokinetically injected for 5 seconds at 15,000 volts and separated at 15,000 volts for 24 minutes with a run temperature of 60 degrees C. DNA sizing was performed with ROX-labeled GS500 as the internal sizing standard. Samples were prepared by adding 1 micro-L PCR product to 20 micro-L deionized formamide containing the ROX-GS500 standard. The samples were heat-denatured at 95 degrees C for 3 minutes and then snap cooled on ice prior to being loaded into the autosampler tray. These separation conditions and sizing standards are commonly used in validated protocols by forensic DNA laboratories. Following data collection, samples were analyzed with Genescan 2.1 and Genotyper 2.0 software programs (ABI). While standard CE conditions were used, new PCR conditions were developed to dramatically reduce the cost of using ABI's STR kits. The PCR volume was reduced from the standard 50 micro-L described in the ABI protocol (AmpF1STR[R] ProfilerPlus[TM], 1998, and AmpF1STR[R] Cofiler[TM], 1998) to 5 micro-L, which corresponded with a cost reduction of 90% per DNA amplification. The kit reagents were mixed in their ABI-specified proportions--11 micro-L primer mix, 1 micro-L TaqGold polymerase (5 U/micro-L), and 21 micro-L PCR mix. A 3 micro-L aliquot of this master mix was then added to each tube along with 2 micro-L of genomic DNA template (typically at 1-2 ng/micro-L). Both PE9700 and MJ Research thermal cyclers worked for this reduced PCR volume method provided that the 200 micro-L PCR tubes were sealed well to prevent evaporation. Results showed that an 8-strip of 0.2 mL thin wall PCR tubes from Out Patient Services, Inc., (OPS) (Petaluma, CA) worked best for the PE9700 thermal cycler. Only 1 micro-L is needed for CE sample preparation; a sample that can be re-injected multiple times if needed. Thus, with a 50 micro-L PCR reaction, 49 micro-L were never used to produce a result under the standard ABI protocol. A 5 micro-L PCR produces less waste in addition to being less expensive. More importantly, the multiplex STR amplicons were more concentrated in a lower volume and produced higher signals in the ABI 310 data collection. Peak signals were often off-scale, and the number of cycles in the cycling program could be reduced from 28 to 26, or even 25 with some DNA templates. This 5 micro-L PCR also worked well with FTA paper punches that have been washed with FTA purification reagent (Life Technologies). The California Department of Justice ran one plate of 88 samples with the AmpF1STR[R] Profiler[TM] kit on an ABI 310 Genetic Analyzer and provided those genotypes for comparison purposes. These results provided an independent verification of our work. --------------------------- Results and Discussion of STR Analysis by Mass Spectrometry In the course of this work, thousands of data points were collected using STR markers of forensic interest verifying that GeneTrace's mass spectrometry technology works. During this same time, tens of thousands of data points were gathered across hundreds of different microsatellite markers from corn and soybean as part of an ongoing plant genomics partnership with Monsanto Company (St. Louis, MO). Whether the DNA markers used come from humans or plants, the characteristics described below apply when analyzing polymorphic repeat loci. Marker Selection and Feasibility Studies With STR Loci Prior to receiving grant funding, feasibility work had been completed using the STR markers TH01, CSF1PO, FES/FPS, and F13A1 in the summer and fall of 1996 (Becker et al., 1997). At the start of this project, a number of STR loci were considered as possible candidates to expand upon the initial four STR markers and to develop a set of markers that would work well in the mass spectrometer and would be acceptable to the forensic DNA community. Searches were made of publicly available databases, including the Cooperative Human Linkage Center (http://lpg.nci.nih.gov/CHLC), the Genome Database (http://gdbwww.gdb.org), and Weber set 8 of the Marshfield Medical Research Foundation's Center for Medical Genetics (http://www.marshmed.org/genetics). Literature was also searched for possible tetranucleotide markers with PCR product sizes below 140 bp in size to avoid having to redesign the PCR primers to meet our limited size range needs (Hammond et al., 1994; Lindqvist et al., 1996). The desired characteristics also included high heterozygosity, moderate number of alleles (<7 or 8 to maintain a narrow mass range) with no known microvariants (to avoid the need for a high degree of resolution), and balanced allele frequencies (most commonly allele <40% and least common allele >5%). This type of marker screen was found to be rather inefficient because the original primer sets reported in public STR databases were designed for gel-based separations, which were optimal over a size range of 100-400 bp. In fact, most of the PCR product sizes were in the 200-300 bp range. From a set of several thousand publicly available STRs, only a set of eight candidate tetranucleotide STRs were initially identified; three of which were tested using the original reported primers (exhibit 33). The initial goal was to identify ~25 markers that spanned all 22 autosomal chromosomes as well as the X and Y sex chromosomes. Researchers quickly realized that population data were not available on these "new" markers and would not be readily accepted without extensive testing and validation. Since one of the objectives was to produce STR marker sets that would be of value to the forensic DNA community, the next step taken was the examination of STR markers already in use. After selecting STR markers used by the Promega Corporation, Applied Biosystems, and the Forensic Science Service (FSS), researchers redesigned primer pairs for each STR locus to produce smaller PCR product sizes. These STR markers included TPOX, D5S818, D7S820, D13S317, D16S539, LPL, F13B, HPRTB, D3S1358, VWA, FGA, CD4, D8S1179, D18S51, and D21S11. The primers for TH01 and CSF1PO were also redesigned to improve PCR efficiencies and to reduce the amplicon sizes. Primers for amelogenin, a commonly used sex-typing marker, were also tested (Sullivan et al., 1993). In addition, two Y-chromosome STRs, DYS19 and DYS391, were examined briefly. Exhibit 34 summarizes the STR primer sets that were developed and tested over the course of this project. However, with the announcement of the 13 CODIS core loci in the fall of 1997, emphasis switched to CSF1PO, TPOX, TH01, D3S1358, VWA, FGA, D5S818, D7S820, D13S317, D16S539, D8S1179, D18S51, D21S11, and the sex-typing marker amelogenin. The newly designed GeneTrace primers produced smaller PCR products than those commercially available from Applied Biosystems or Promega (exhibit 2), yet resulted in identical genotypes in almost all samples tested. For example, correct genotypes were obtained on the human cell line K562, a commonly used control for PCR amplification success. Exhibit 35 shows the K562 results for CSF1PO, TPOX, TH01, and amelogenin. These results were included as part of a publication demonstrating that time-of-flight mass spectrometry could perform accurate genotyping of STRs without allelic ladders (Butler et al., 1998). Caveats of STR analysis by mass spectrometry While mass spectrometry worked well for a majority of the STR markers tested, a few limitations excluded some STRs from working effectively. Two important issues that impact mass spectrometry results are DNA size and sample salts. Mass spectrometry resolution and sensitivity are diminished when either the DNA size or the salt content of the sample is too large (Ross and Belgrader, 1997; and Taranenko et al., 1998). By designing the PCR primers to bind close to the repeat region, the STR allele sizes are reduced so that resolution and sensitivity of the PCR products are benefited. In addition, the GeneTrace-patented cleavage step reduces the measured DNA size even further. When possible, primers are designed to produce amplicons that are less than 120 bp, although work is sometimes undertaken with STR alleles that are as large as 140 bp in size. This limitation in size prevents reliable analysis of STR markers with samples containing a large number of repeats, such as most of the FGA, D21S11, and D18S51 alleles (exhibit 2). To overcome the sample salt problem, researchers used a patented solid-phase purification procedure that reduced the concentration of magnesium, potassium, and sodium salts in the PCR products prior to being introduced to the mass spectrometer (Monforte et al., 1997). Without the reduction of the salts, resolution is diminished by the presence of adducts. Salt molecules bind to the DNA during the MALDI ionization process and give rise to peaks that have a mass of the DNA molecule plus the salt molecule. Adducts broaden peaks and thus reduce peak resolution. The sample purification procedure, which was entirely automated on a 96-tip robotic workstation, reduced the PCR buffer salts and yielded "clean" DNA for the mass spectrometer. Appropriate care must be taken to prevent samples from being contaminated with salts both during and after the sample purification procedure. Size reduction methods The portion of the DNA product on the other side of the repeat region from the cleavable primer was removed in one of two possible ways: using a restriction enzyme (Monforte et al., 1999) or performing a nested linear amplification with a ddN terminating nucleotide (Braun et al., 1997a and 1997b). Both methods have pros and cons. A restriction enzyme, DpnII, which recognizes the sequence 5'... ^GATC...3', was used with VWA samples to remove 45 bp from each PCR product. For example, the GenBank allele that contains 18 repeat units and is 154 bp following PCR amplification may be reduced to 126 nucleotides following primer cleavage, but it can be shortened to 81 nucleotides if primer cleavage is combined with DpnII digestion. At 81 nt or 25,482 Da, the STR product size is much more manageable in the mass spectrometer. This approach works nicely provided the restriction enzyme recognition site remains unchanged. The DpnII digestion of VWA amplicons worked on all samples tested, including a reamplification of an allelic ladder from ABI (exhibit 36). However, the cost and time of analysis are increased with the addition of a restriction enzyme step. The second approach for reducing the overall size of the DNA molecule in the mass spectrometer involved using a single ddNTP with three regular dNTPs. A linear amplification extension reaction was performed with the ddNTP terminating the reaction on the opposite side of the repeat from the cleavable primer. However, there were several limitations with this "single base sequencing" approach. First, it only worked if the repeat did not contain all four nucleotides. For example, a nucleotide mixture of dideoxycytosine (ddC), deoxyadenosine (dA), deoxythymidine (dT), and deoxyguanosine (dG) will allow extension through an AATG repeat (as occurs in the bottom strand of TH01) but will terminate at the first C nucleotide in a TCAT repeat (the top strand of TH01). Thus one is limited with the DNA strand that can be used for a given combination of dideoxynucleotide and corresponding deoxynucleotides. In addition, primer position and STR sequence content are important. If a ddC mix is used, the DNA sample cannot contain any C nucleotides prior to the repeat region or within the repeat, or the extension will prematurely halt and the information content of the full repeat will not be accurately captured. In most cases, this requires the extension primer to be immediately adjacent to the STR repeat, a situation that is not universally available due to the flanking sequences around the repeat region. For example, this approach will work with TH01 (AATG) but not VWA, which has three different repeat structures: AGAT, AGAC, and AGGT. Thus with VWA, a ddC would extend through the AGAT repeat but would be prematurely terminated at the C in the AGAC repeat, and valuable polymorphic information would be lost. The use of a terminating nucleotide also provides a sharper peak for an amplified allele compared with the split peaks or wider peaks (if resolution is poor) that can result from partially adenylated amplicons (i.e., -A/+A). Exhibit 37 illustrates the advantage of a ddG termination on a D8S1179 heterozygous sample containing 11 and 13 TATC repeats. In the bottom panel, 23 nt were removed compared with the top panel, which corresponds to a mass reduction of almost 8,000 Da. The peaks are sharper in the lower panel, as the products are blunt ended. Identical genotypes were obtained with both approaches, illustrating that the ddG termination is occurring at the same point on the two different sized alleles. To summarize, STR sample sizes were reduced using primers that have been designed to bind close to the repeat region or even partially on the repeat itself. A cleavable primer was incorporated into the PCR product to allow post-PCR chemical cleavage and subsequent mass reduction. Two additional post-PCR methods were also explored to further reduce the measured DNA size. These methods included restriction enzyme digestion in the flanking region on the other side of the repeat region from the cleavable primer and a primer extension through the repeat region with a single dideoxynucleotide terminator (single base sequencing approach). To illustrate the advantages of these approaches to reduce the overall DNA product mass, researchers examined the STR locus TPOX. Using a conventional primer set, a sample containing 11 repeats measured 232 bp or about 66,000 Da. By redesigning the primers to anneal close to the repeat region, a PCR product of 89 bp was obtained. With the cleavable primer, the size was reduced to 69 nt or 21,351 Da. By incorporating a ddC termination reaction, another 20 nt were removed leaving only 49 nt or about 12,000 Da (primarily only the repeat region). The repeat region contained 44 nt (4 nt x 11 repeats) or about 10,500 Da. The ddC termination was also used in multiplex STR analysis to produce a CSF1PO-TPOX-TH01 triplex (exhibits 4 and 5). The repeat sequences used for these STR loci were AGAT for CSF1PO, AATG for TPOX, and AATG for TH01. The level of sequence clipping by ddC was as follows: CSF1PO (-14 nt), TPOX (-20 nt), and TH01 (-4 nt). Multiplex STR Work Due to the limited size range of DNA molecules that may be analyzed by this technique, a new approach to multiplexing was developed that involved interleaving alleles from different loci rather than producing nonoverlapping multiplexes. If the amplicons could be kept under about 25,000 Da, a high degree of mass accuracy and resolution could be used to distinguish alleles from multiple loci that may differ by only a fraction of a single nucleotide (exhibit 10). Allelic ladders are useful to demonstrate that all alleles in a multiplex are distinguishable (exhibit 9). The expected masses for a triplex involving the STR loci CSF1PO, TPOX, and TH01 (commonly referred to as a CTT multiplex) are schematically displayed in exhibit 4. All known alleles for these STR loci, as defined by STRBase (Ruitberg et al., 2001), are fully resolvable and far enough apart to be accurately determined. For example, TH01 alleles 9.3 and 10 fall between CSF1PO alleles 10 and 11. For all three STR systems in this CTT multiplex, the AATG repeat strand is measured, which means that the alleles within the same STR system differ by 1,260 Da. The smallest spread between alleles across multiple STR systems in this particular multiplex exists between the TPOX and TH01 alleles, where the expected mass difference is 285 Da. TPOX and CSF1PO alleles differ by 314 Da, while TH01 and CSF1PO alleles differ by 599 Da. By using the same repeat strand in the multiplex, the allele masses between STR systems all stay the same distance apart. Each STR has a unique flanking region and it is these sequence differences between STR systems that permit multiplexing in such a fashion as described here. An actual result with this CTT multiplex is shown in exhibit 5. This particular sample is homozygous for both TPOX (8,8) and CSF1PO (12,12) and heterozygous at the TH01 locus (6,9.3). It is also worth noting that this particular CTT multiplex was designed to account for possible, unexpected microvariants. For example, a CSF1PO allele 10.3 that appears to be a single base shorter than CSF1PO allele 11 was recently reported (Lazaruk et al., 1998). With the CTT multiplex primer set described here, a CSF1PO 10.3 allele would have an expected mass of 21,402 Da, which would be fully distinguishable from the nearest possible allele (i.e., TH01 allele 10) because these alleles would be 286 Da apart. Using a mass window of 100 Da as defined by previous precision studies (Butler et al., 1998), all possible alleles including microvariants should be fully distinguishable. STR multiplexes are designed so that expected allele masses between STR systems are offset in a manner that possible microvariants, which are most commonly insertions or deletions of a partial repeat unit, may be distinguished from all other possible alleles. The larger the allele's mass range, the more difficult it becomes to maintain a high degree of mass accuracy. For example, exhibit 8 shows the observed mass for TH01 allele 9.3 is -52 Da from its expected mass, while TPOX allele 9 is only 3 Da from its expected mass. In this particular case, the mass calibrants used were 4,507 Da and 10,998 Da. Thus, the TPOX allele's mass measurement was more accurate and closer to the calibration standard. The ability to design multiplexes that have a relatively compact mass range is important to maintaining the high level of mass accuracy needed for closely spaced alleles from different, overlapping STR loci. The mass calibration standards should also span the entire region of expected measurement to guarantee the highest degree of mass accuracy. Two possible multiplexing strategies for STR genotyping are illustrated in exhibit 38. Starting with a single punch of blood stained FTA paper, it is possible to perform a multiplex PCR (simultaneously amplifying all STRs of interest) followed by another PCR with primer sets that are closer to the repeat region. With this approach, single or multiplexed STR products can be produced that are small enough for mass spectrometry analysis. Alternatively, multiple punches could be made from a single bloodstain on the FTA paper followed by singleplex or multiplex PCR with mass spectrometry primers. After the genotype is determined for each STR locus in a sample, the information would be combined to form a single sample genotype for inclusion in CODIS or some other DNA database. This multiplexing approach permits flexibility for adding new STR loci or only processing a few STR markers across a large number of samples at a lower cost than processing extensive and inflexible STR multiplexes. Comparison Tests Between ABI 310 and Mass Spectrometry Results A plate of 88 samples from the CDOJ DNA Laboratory was tested with 10 different STR markers and compared with results obtained using the ABI 310 Genetic Analyzer and commercially available STR kits. The samples were supplied as a 200 micro-L aliquot of extracted genomic DNA in a 96-well tray with each sample at a concentration of 1 ng/micro-L. A 5 micro-L aliquot was used for each PCR reaction, or 5 ng total per reaction. Since each marker was amplified and examined individually, approximately 35 ng of extracted genomic DNA was required to obtain genotypes on the same 7 markers as were amplified in a single AmpF1STR[R] COfiler[TM] STR multiplex. Only 2 ng of genomic DNA were used per reaction with the AmpF1STR[R] COfiler[TM] kit. Thus, a multiplex PCR reaction is much better suited for situations where the quantity of DNA is limited (e.g., crime scene sample). However, in most cases involving high-throughput DNA typing (e.g., offender database work), hundreds of nanograms of extracted DNA would be easily available. A major advantage of the mass spectrometry approach is speed of the technique and the high-throughput capabilities when combined with robotic sample preparation. The data collection times required for the 88 CDOJ samples using the ABI 310 Genetic Analyzer and GeneTrace's mass spectrometry method are compared in exhibit 11. While it took the ABI 310 almost 3 days to collect the data for the 88 samples, the same genotypes were obtained on the mass spectrometer in less than 2 hours. Even the ability to analyze multiple STR loci simultaneously with different fluorescent tags on the ABI 310 could not match the speed of GeneTrace's mass spectrometry data collection with each marker run individually. To verify that the mass spectrometry approach produces accurate results, comparison studies were performed on the genotypes obtained from the two different methods across 8 different STR loci. Exhibits 12-19 contain a direct comparison with 1,408 possible data points (2 methods 5 88 samples 5 8 loci). With a few minor exceptions, there was almost a 100% correlation between the two methods. In addition to the data obtained on the 8 loci from both the ABI 310 and the mass spectrometer, two additional markers (D8S1179 and DYS391) were measured by mass spectrometry across these same 88 samples (exhibits 39-40). Both the D8S1179 and the DYS391 primer sets worked extremely well in the mass spectrometer (exhibit 41). Thus, it is likely that if results were made available on these same samples with fluorescent STR primer sets (e.g., D8S1179 is in the AmpF1STR[R] Profiler Plus[TM] kit), there would also be a further correlation between the two methods. PCR Issues Null alleles When making comparisons between two methods that use different PCR primer sets, the issue is whether or not a different primer set for a given STR locus will result in different allele calls through possible sequence polymorphisms in the primer binding sites. In other words, do primers used for mass spectrometry that are closer to the repeat region than those primers used in fluorescent STR typing yield the same genotype? Differences between primer sets are possible if there are sequence differences outside the repeat region that occur in the primer binding region of either set of primers (exhibit 42). This phenomenon produces what is known as a "null" allele, or in other words, the DNA template exists for a particular allele but fails to amplify during PCR due to primer hybridization problems. In all cases except the STR locus D7S820, there was excellent correlation in genotype calls between the two methods (where mass spectrometry and CE results were obtained), signifying that the mass spectrometry primers did not produce any null alleles. For the STR locus D7S820, 17 of 88 samples did not agree with the two methods (exhibit 18). The bottom two panels in exhibit 43 illustrate more microheterogeneity at this locus than previously reported. On the lower left plot, only the allele 10 peak can be seen; allele 8, which was seen with PCR amplification using a fluorescent primer set, is missing (see position of red arrow in exhibit 43). On the lower right plot, both allele 8 and allele 10 are amplified and detected in the mass spectrometer, confirming that the problem is with the PCR amplification and not the mass spectrometry data collection. In this particular case, there is a difference between those two alleles 8, meaning that the mass spectrometer primer set identified a new, previously unreported allele. When using fluorescent primer sets that anneal 50-100 bases or more from the repeat region, a single-base change (e.g., T to C) out of a 300 bp PCR product is difficult to detect. Upon comparing the results of mass spectrometer data where there were missing alleles with the results from the ABI 310, it was noted that the situation occurred only with some allele 8s, 9s, and 10s (see underlined alleles in the ABI 310 column of exhibit 18). Thus, these null alleles were variants of alleles with 8, 9, or 10 repeats. Most likely, a sequence microvariant occurs within the repeat region near the 3'-end of the reverse primer, which anneals to two full repeats. Unfortunately, time constraints restricted the gathering of sequence information for these samples to confirm the observed variation. Interestingly enough, the D7S820 locus has been reported to cause similar null allele problems with other primer sets (Schumm et al., 1997). Microvariants Sequence variation between alleles can take the form of insertions, deletions, or nucleotide changes. Alleles containing some form of sequence variation compared with more commonly observed alleles are often referred to as microvariants because they are slightly different from full repeat alleles. For example, the STR locus TH01 contains a 9.3 allele, which has 9 full repeats (AATG) and a partial repeat of 3 bases (ATG). In this particular example, the 9.3 allele differs from the 10 allele by a single base deletion of adenine. Microvariants exist for most STR loci and are being identified in greater numbers as more samples are being examined around the world. In this study, three previously unreported STR microvariants (exhibit 44) were discovered during the analysis of 38 genomic DNA samples from a male population data set provided by Dr. Oefner (exhibit 45). These microvariants occurred in the three most polymorphic STR loci that possess the largest and most complex repeat structures: FGA, D21S11, and D18S51. The ability to make accurate mass measurements with mass spectrometry is a potential advantage when locating new microvariants. If the mass precision is good, then any peaks that have large offsets from the expected full repeat alleles could be suspect microvariants in the form of insertions or deletions because their masses would fall outside the expected variance due to instrument variation. This possibility is especially true when working with heterozygous samples. Microvariants can be detected by using the mass difference between the two alleles and comparing this value with the expected value for full repeats or with the allele peak mass offsets. If the peak mass offsets shift together, then both alleles are full repeats, but if one of the peak mass offsets is significantly different (e.g., About 300 Da), a possible insertion or deletion exists in one of the alleles. Exhibit 46 illustrates this concept by plotting the mass offset (from a calculated allele mass) of allele 1 verses the mass offset (from a calculated allele mass) of allele 2. Note that the 9.3 microvariant (i.e., partial) repeat alleles for TH01 cluster away from the comparison of full repeat versus full repeat allele. On the other hand, results for the other three STR loci, which have no known microvariants in this data set, have mass offsets that shift together for the heterozygous alleles. Exhibit 47 compares the peak mass offsets for the amelogenin X allele with the Y allele and demonstrates that full "repeats" shift together during mass spectrometry measurements. Nontemplate addition DNA polymerases, particularly the Taq polymerase used in PCR, often add an extra nucleotide to the 3'-end of a PCR product as template strands are copied. This nontemplate addition--which is most often an adenine, hence, the term "adenylation"--can be favored by adding a final incubation step at 60 degrees C or 72 degrees C after the temperature cycling steps in PCR (Clark, 1988, and Kimpton et al., 1993). However, the degree of adenylation is dependent on the sequence of the template strand, which in the case of PCR results from the 5'-end of the reverse primer. Thus, every locus will have different adenylation properties because the primer sequences are different. From a measurement standpoint, it is better to have all molecules of a PCR product as similar as possible for a particular allele. Partial adenylation, where some of the PCR products do not have the extra adenine (i.e., -A peaks) and some do (i.e., +A peaks), can contribute to peak broadness if the separation system's resolution is poor (see top panel of exhibit 37). Sharper peaks improve the likelihood that a system's genotyping software can make accurate calls. Variation in the adenylation status of an allele across multiple samples can have an impact on accurate sizing and genotyping potential microvariants. For example, a nonadenylated TH01 10 allele would look the same as a fully adenylated TH01 9.3 allele in the mass spectrometer because their masses are identical. Therefore, it is beneficial if all PCR products for a particular amplification are either +A or -A rather than a mixture (e.g., +/-A). By using the temperature soak at the end of thermal cycling, most of the STR loci were fully adenylated, with the notable exception of TPOX, which was typically nonadenylated, and TH01, which under some PCR conditions produced partially adenylated amplicons. For making correct genotype calls, the STR mass ladder file (exhibit 30) was altered according to the empirically determined adenylation status. During the course of this project, Platinum[R] GenoTYPE[TM] Tsp DNA polymerase (Life Technologies, Rockville, MD) became available that exhibits little to no nontemplate nucleotide addition. This new DNA polymerase was tested with STR loci that had been shown to produce partial adenylation to see if the +A peak could be eliminated. Exhibit 48 compares mass spectrometry results obtained using AmpliTaq Gold (commonly used) polymerase with the new Tsp polymerase. The Tsp polymerase produced amplicons with only the -A peaks, while TaqGold showed partial adenylation with these TH01 primers. Thus, this new polymerase has the potential to produce sharper peaks (i.e., no partial adenylation) and allele masses that can be more easily predicted (i.e., all PCR products would be nonadenylated). Stutter products During PCR amplification of STR loci, repeat slippage can occur and result in the loss of a repeat unit as DNA strand synthesis occurs through a repeated sequence. These stutter products are typically 4 bases, or one tetranucleotide repeat, shorter than the true allele PCR product. The amount of stutter product compared to the allele product varies depending on the STR locus and the length of the repeat, but typically stutter peaks are 2-10% of the allele peak height (Walsh et al., 1996). Forensic DNA scientists are concerned about stutter products because their presence can interfere in the interpretation of DNA mixture profiles. When reviewing plots of GeneTrace's mass spectrometry results for STR loci, forensic scientists have commented on the reduced level of stutter product detection (exhibit 35). There are two possibilities for this reduction: o Since the primers are closer to the repeat region, smaller PCR products are amplified, which means that the DNA polymerase does not have to hold on to the extending strand as long for synthesis purposes. It is possible that the polymerase reads through the repeat region "faster" and, therefore, the template strands do not have as much of an opportunity to slip and reanneal out of register on the repeat region. For example, Taq polymerase has a processivity rate of about 60 bases before it falls off the extending DNA strand; therefore, the closer the PCR product size is to 60 bases, the better the extension portion of the PCR cycle. GeneTrace's PCR product sizes, which are typically less than 100 bp, are much smaller than the fluorescently labeled primer sets used by most forensic DNA laboratories (exhibit 2). However, this needs to be studied more extensively with multiple primer sets on a particular STR locus that generates various sized amplicons. For example, the primer sets described in exhibit 24 could be fluorescently labeled and analyzed on the ABI 310, where the stutter product peak heights could be quantitatively compared to the allele peak heights. o The more likely reason that less stutter is observed by mass spectrometry is that the signal-to-noise ratio is much lower in mass spectrometry than in fluorescence measurements. Fluorescence techniques have a much lower background and are more sensitive for the detection of DNA than mass spectrometry. Thus, stutter may be present at similar ratios compared with those observed in fluorescence measurements, but because stutter is part of the baseline noise of mass spectrometry data, it may not be seen in the mass spectrum. This latter explanation is probably more likely, as indicated in very strong stutter peaks for some dinucleotide repeat markers (exhibit 49). Whether stutter products are present or not, GeneTrace's current STR genotyping software has been designed to recognize them and not call them as alleles. Primer sequence determinations from commercial STR kits Primarily, two commercial manufacturers supply STR kits to the forensic DNA community: Promega Corporation and Applied Biosystems. These kits come with PCR primer sequences that permit simultaneous multiplex PCR amplification of up to 16 STR loci. One of the primers for each STR locus is labeled with a fluorescent dye to permit fluorescent detection of the labeled PCR products. Since the primer sequences are not disclosed by the manufacturers, mass spectrometry was used to determine where they annealed to the STR sequences compared with GeneTrace primers (see previous discussion on null alleles). First, the primer mixtures were spotted and analyzed to determine each primer's mass (top panel of exhibit 50). Then a 5' to 3' exonuclease was added to the primer mix and heated to 37 degrees C for several minutes to digest the primer one base at a time. An aliquot was removed every 5-10 minutes to obtain a time course on the digestion reaction. Each aliquot was spotted in 3-hydroxypicolinic acid matrix solution (Wu et al., 1993), allowed to dry, and analyzed in the mass spectrometer. A digestion reaction produces a series of products that differ by one nucleotide. By measuring the mass difference between each peak, the original primer sequence may be determined (bottom panel of exhibit 50). Only the unlabeled primers will be digested because the covalently attached fluorescent dye blocks the 5'-end of the dye-labeled primer. Using only a few bases of sequence (e.g., 4-5 bases), it is possible to make a match on the appropriate STR sequence obtained from GenBank to determine the 5'-end of the primer without the fluorescent label. With the full-length primer mass obtained from the first experiment, the remainder of the unlabeled primer can be identified. The position of the 5'-end of the other primer can be determined using the GenBank sequence and the PCR product length for the appropriate STR allele listed in GenBank (exhibit 2). The sequence of the labeled primer can be ascertained by using the appropriate primer mass determined from the first experiment and subtracting the mass of the fluorescent dye. The primer mass is then used to obtain the correct length of the primer on the GenBank sequence and the primer's sequence. Finally, an entire STR multiplex primer set can be measured together in the mass spectrometer to observe the primer balance (exhibit 51). High-performance liquid chromatography fraction collection can be used to pull primers apart from complex, multiplex mixtures, and each primer can be identified as previously described. The primer sequences from both Promega and Applied Biosystems STR loci TH01 (exhibit 52), TPOX (exhibit 53), and CSF1PO (exhibit 54) were identified using this procedure. A comparison of the primer sequences from the two manufacturers found that they were very similar. The 3'-ends of the primer sets--the most critical portions for annealing during--were almost identical between the different kits. The ABI primers were typically shorter at the 5'-end and, therefore, produced PCR products that were ~10 bases shorter than those produced by the corresponding Promega primers. In all three STR loci, the primers annealed further away from the repeat region than the GeneTrace primer sets. Analytical Capabilities of This Mass Spectrometry Method Using the current primer design strategy, most STR alleles ranged in size from about 10,000 Da to about 40,000 Da. In mass spectrometry, the smaller the molecule, the easier it is to ionize and detect (all other things being equal). Resolution, sensitivity, and accuracy are usually better the smaller the DNA molecule being measured. Because the possible STR alleles are relatively far apart, reliable genotyping is readily attainable even with DNA molecules at the higher mass region of the spectrum. For example, neighboring full-length alleles for a tetranucleotide repeat, such as AATG, differ in mass by 1,260 Da. Resolution Dinucleotide repeats, such as CA repeats, require a resolution of at least 2 bp in order to resolve stutter products from the true allele or heterozygotes that differ by a single repeat. Trinucleotide and tetranucleotide repeats, with their larger repeat structure, are more easily resolved because there is a larger mass difference between adjacent alleles. However, the overall mass of the PCR product increases more rapidly with tri- or tetranucleotide repeats. For example, the repeat region for 40 GA repeats is 25,680 Da, while the mass of the repeat region quickly increases to 37,200 Da for 40 AAT repeats and 50,400 Da for 40 AATG repeats. GeneTrace has demonstrated that a resolution of a single dinucleotide repeat (about 600 Da) may be obtained for DNA molecules up to a mass of about 35,000 Da. This reduced resolution at higher mass presents a problem for polymorphic STR loci such as D18S51, D21S11, and FGA because single base resolution is often required to accurately call closely spaced alleles or to distinguish a microvariant containing a partial repeat from a full-length allele. These three STR loci also contain long alleles. For example, D21S11 has reported alleles of up to 38 repeats (mixture of TCTA and TCTG) in length, D18S51 up to 27 AGAA repeats, and FGA up to 50 repeats (mixture of CTTT and CTTC). Heterozygous FGA alleles that differed by only a single repeat were more difficult to genotype accurately than smaller sized STR loci due to poor resolution at masses greater than about 35,000 Da (see samples marked in red in exhibit 19). The analysis of STR allelic ladders demonstrates that all alleles can be resolved for an STR locus. Allelic ladders from commercial kits were typically diluted 1:1000 with deionized water and then reamplified with the GeneTrace primers that bound closer to the repeat region than the primers from the commercial kits. This reamplification provided PCR products for demonstrating that the needed level of resolution (i.e., distinguishing adjacent alleles) is capable at the appropriate mass range in the mass spectrometer as well as demonstrating that the GeneTrace primers amplify all alleles (i.e., no allele dropout from a null allele). A number of STR allelic ladders were tested in this fashion, including TH01 (exhibit 55); CSF1PO, TPOX, and VWA (exhibit 36); and D5S818 (exhibit 56). All tetranucleotide repeat alleles were resolvable in these examples, demonstrating 4 bp resolution, and TH01 single base pair resolution was seen between alleles 9.3 and 10. Sensitivity To determine the sensitivity of Gene-Trace's STR typing assay, TPOX primers were tested with a dilution series of K562 genomic DNA (20 ng, 10 ng, 5 ng, 2 ng, 1 ng, 0.5 ng, 0.2 ng, and 0 ng). Promega's Taq polymerase and STR buffer were used with 35 PCR cycles as described in the scope and methodology section. Peaks for the correct genotype (heterozygote 8,9) could be seen down to the lowest level tested (0.2 ng or 200 picograms), while the negative control was blank. Exhibit 57 contains a plot with the mass spectra for 20 ng, 5 ng, 0.5 ng, and 0 ng. While each PCR primer pair can exhibit a slightly different efficiency, human DNA down to a level of about 1 ng can be reliably PCR amplified and detected using mass spectrometry. GeneTrace's most recent protocol involved 40-cycle PCR and the use of TaqGold[TM] DNA polymerase, which should improve overall yield for STR amplicons. All of the samples tested from CDOJ were amplified with only 5 ng of DNA template and yielded excellent results (exhibits 12-19, 31, 39, 40, 43, 58, and 59). In terms of absolute sensitivity in the mass spectrometer, several hundred femtomoles of relatively salt-free DNA molecules were typically found necessary for detection. GeneTrace's PCR amplifications normally produced several picomoles of PCR product, approximately an order of magnitude more material than is actually needed for detection. Mass accuracy and precision Mass accuracy is an important issue for this mass spectrometry approach to STR genotyping, as a measured mass for a particular allele is compared with an ideal mass for that allele. Due to the excellent accuracy of mass spectrometry, internal standards are not required to obtain accurate DNA sizing results as in gel or CE measurements (Butler et al., 1998). To make an inaccurate genotype call for a tetranucleotide repeat, the mass offset from an expected allele mass would have to be larger than 600 Da (half the mass of an about 1,200 Da repeat). GeneTrace has observed mass accuracies on the order of 0.01 nucleotides (<3 Da) for STR allele measurements. However, under routine operation with GeneTrace's automated mass spectrometers, some resolution, sensitivity, and accuracy may be sacrificed compared with a research-grade instrument to deliver data at a high rate of speed. Almost all STR allele size measurements should be within +/-200 Da, or a fraction of a single nucleotide, of the expected mass. Exhibit 60 illustrates that the precision and accuracy for STR measurements is good enough to make accurate genotyping calls with only a routine mass calibration, even when comparing data from the same samples collected months apart. Precision is important for STR allele measurements in mass spectrometry because no internal standards are being run with each sample to make adjustments for slight variations in instrument conditions between runs. To demonstrate the excellent reproducibility of mass spectrometry, 15 mass spectra of a TPOX allelic ladder were collected. A table of the obtained masses for alleles 6, 7, 8, 9, 10, 11, 12, and 13 shows that all alleles were easily segregated and distinguishable (exhibit 61). Statistical analysis of the data found that the standard deviation about the mean for each allele ranged from 20 to 27 Da, or approximately 0.1% relative standard deviation (RSD). The mass between alleles is equal to the repeat unit, which in the case of TPOX is 1,260 Da for an AATG repeat (exhibit 62). Thus, each allele is easily distinguishable. Measurements were made of the same DNA samples over a fairly wide time-span, revealing that masses can be remarkably similar, even when data points are recollected months later. Exhibit 60 compares 57 allele measurements from 6 different TPOX alleles collected 6 months apart. The first data set was collected on October 1, 1998, and the second data set on March 26, 1999. Amazingly enough, some of the alleles had identical measured masses, even though different mass calibration constants (and even different instruments) were used. The bottom line is whether or not a correct genotype can be obtained using this new technology. Exhibit 63 compares the genotypes obtained using a conventional CE separation method and this mass spectrometry technique across 3 STR markers (D16S539, D8S1179, and CSF1PO) and indicates an excellent agreement between the methods. With the CDOJ samples tested, there was complete agreement on all observed genotypes for the STR loci CSF1PO, TH01, and D3S1358 as well as the sex-typing marker amelogenin (exhibits 12, 14-16). Some "gas-phase" dimers and trimers fell into the allele mass range and confused the calling for TPOX (exhibit 13) and D16S539 (exhibit 17) on several samples. Gas-phase dimers and trimers are assay artifacts that result from multiple excess primer molecules colliding in the gas phase and being ionized during the MALDI process. A mass offset plot like that shown in exhibit 46 can be used to detect these assay artifacts as they fall outside the tight grouping and inside the 300 Da window. With the CDOJ samples, D7S820 exhibited null alleles (exhibit 18) and FGA had some unique challenges due to its larger size, such as problems with resolution of closely spaced heterozygotes and poorer mass calibration since the measured alleles were further away from the calibration standards (exhibit 19). Thus, when the PCR situations such as null alleles are accounted for and smaller loci are used, this mass spectrometry method produces results comparable to traditional methods of STR genotyping. Data collection speed The tremendous speed advantage of mass spectrometry can be seen in exhibit 11. Over the course of this project, data collection speed increased by a factor of 10 from about 50 seconds/ sample to less than 5 seconds/sample. This speed increase resulted from improved software and hardware on the automated mass spectrometers and from improved sample quality (i.e., better PCR conditions that yielded more product and improved sample cleanup that in turn yielded "cleaner" DNA). With data collection time around 5 seconds per sample, achieving sample throughputs of almost 1,000 samples per hour is possible, and 3,000-4,000 samples per system per day is reasonable when operating at full capacity. Sample backlogs could be erased rather rapidly with this kind of throughput. By way of comparison, it takes an average of 5 minutes to obtain each genotype (assuming a multiplex level of 6 or 7 STRs) using conventional CE methods (exhibit 11). Thus, the mass spectrometry method described in this study is two orders of magnitude faster in sample processing time than conventional techniques. --------------------------- Results and Discussion of Multiplex SNPs Work began on the development of multiplexed SNP assays in the summer of 1998 after notice that a second NIJ grant, Development of Multiplexed Single Nucleotide Polymorphism Assays from Mitochondrial and Y-Chromosome DNA for Human Identity Testing Using Time-of-Flight Mass Spectrometry, had been funded. Excellent progress was made toward the milestones on this grant, but the work not finished because this grant was prematurely terminated on the part of GeneTrace in the spring of 1999. The completed work focused on two areas: the development of a 10-plex SNP assay from the mtDNA control region using a single amplicon and the development of a multiplex PCR assay from Y-chromosome SNP markers that involved as many as 18 loci amplified simultaneously. This section describes the design aspects of multiplex PCR and SNP assays along with the progress made toward the goal of producing assays that would be useful for high-throughput screening of mitochondrial and Y-chromosome SNP markers. The approach to SNP determination described here has essentially three steps: (1) PCR amplification, (2) phosphatase digestion, and (3) SNP primer extension. Either strand of DNA may be probed simultaneously in this SNP primer extension assay. PCR primers are designed to generate an amplicon that includes one or more SNP sites. The initial PCR reaction is performed with standard (unlabeled) primers. A phosphatase is then added following PCR to remove all remaining dNTPs so that they will not interfere with the single base extension reaction involving ddNTPs. These reactions can all be performed in the same tube or well in a sample tray. A portion of the phosphatase-treated PCR product is then used for the primer extension assay. In the SNP primer extension assay, a special primer containing a biotin moiety at the 5'-end permits solid-phase capture for sample purification prior to mass spectrometry analysis. This primer hybridizes upstream of the SNP site with the 3'-end immediately adjacent to the SNP polymorphic site. A cleavable nucleotide near the 3'-end allows the 3'-end of the primer to be released from the immobilized portion and reduces the overall mass of the measured DNA molecule (exhibit 21) (Li et al., 1999). The complementary nucleotide(s) to the nucleotide(s) present at the SNP site is inserted during the extension reaction. In the case of a heterozygote, two extension products result. Only a single base is added to the primer during this process because only ddNTPs are used and the dNTPs left over from PCR are hydrolyzed with the phosphatase digestion step. If the extension reaction is not driven to completion (where the primer would be totally consumed), then both primer and extension product (i.e., primer plus single nucleotide) are present after the primer extension reaction. The mass difference between these two DNA oligomers is used to determine the nucleotide present at the SNP site. In the primer extension SNP assay, the primer acts as an internal standard and helps make the measurement more precise. A histogram of mass difference measurements across 200 samples (50 per nucleotide) is shown in exhibit 64. The ddT and ddA differ by only 9 Da and are the most difficult to resolve as heterozygotes or distinguish from one another in terms of mass. As reported in a recently published paper (Li et al., 1999), this approach has been used to reliably determine all four possible SNP homozygotes and all six possible heterozygotes. Mitochondrial DNA Work The control region of mtDNA, commonly referred to as the D-loop, is highly polymorphic and contains a number of possible SNP sites for analysis. MITOMAP, an internet database containing fairly comprehensive information on mtDNA, lists 408 polymorphisms over 1,121 nucleotides of the control region (positions 16020- 576) that have been reported in literature (MITOMAP, 1999). However, many of these polymorphisms are rare and population specific. The present study focused on marker sets from a few dozen well-studied potential SNP sites. Special Agent Mark Wilson from the FBI Laboratory in Washington, D.C., who has been analyzing mtDNA for more than 7 years, recommended a set of 27 SNPs that would give a reasonable degree of discrimination and make the assay about half as informative as full sequencing. His recommended mtDNA sites were positions 16069, 16114, 16126, 16129, 16189, 16223, 16224, 16278, 16290, 16294, 16296, 16304, 16309, 16311, 16319, 16362, 73, 146, 150, 152, 182, 185, 189, 195, 198, 247, and 309. The underlined sites are those reported in a minisequencing assay developed by the Forensic Science Service (Tully et al., 1996). GeneTrace's multiplex SNP typing efforts began with 10 of the SNPs used in the FSS minisequencing assay, since those primer sequences had already been reported and studied together. The reported FSS sequences were modified slightly by removing the poly(T) tail and converting the degenerate bases into the most common sequence variant (identified by examination of MITOMAP information at the appropriate mtDNA position). A cleavable base was also incorporated at varying positions in different primers so that the cleaved primers would be resolvable on a mass scale. Exhibit 26 lists the final primer set chosen for a 10-plex SNP reaction. Eight of the primers detected SNPs on the "heavy" GC-rich strand and two of them identified SNPs on the "light" AT-rich strand of the mtDNA control region (exhibit 65). Five of the primers annealed within hypervariable region I (HV1) and five annealed within hypervariable region II (HV2). All of the 10 chosen SNP sites were transitions of either A to G (purine-to-purine) or C to T (pyrimidine-to- pyrimidine) rather than transversions (purine-to-pyrimidine). Besides primer compatibility (i.e., lack of primer dimer formation or hairpins), another important aspect of multiplex SNP primer design is the avoidance of multiple-charged ions. Doubly charged and triply charged ions of larger mass primers can fall within the mass-to-charge range of smaller primers. Depending on the laser energy used and matrix crystallization, the multiple-charged ions can be significantly abundant (exhibit 66). Primer impurities, such as n-1 failure products, can also impact how close together primers can be squeezed on a mass scale. These primer synthesis failure products will be about 300 Da smaller in mass than the full-length primer. Since an extension product ranges from 273 (ddC) to 313 Da (ddG) larger than the primer itself, a minimum of 650-700 Da is needed between adjacent primers (postcleavage mass) if primer synthesis failures exist to avoid any confusion in making the correct SNP genotype call. Primer synthesis failure products were observed to become more prevalent for larger mass primers. Because resolution and sensitivity in the mass spectrometer decrease at higher masses, it is advantageous to keep the multiplexed primers in a fairly narrow mass window and as small as possible. The primers in this study ranged from 1,580 to 6,500 Da. Exhibit 23 displays the expected primer masses for the mtDNA SNP 10-plex along with their doubly and triply charged ions. The smallest four primers, in the mass range of 1,580-3,179 Da, had primer and extension masses that were similar to multiple-charged ions of larger primers. For example, in the bottom panel of exhibit 6, which shows the 10-plex primers, the doubly charged ion from MT4e (3,250 Da), which probes site L00195, fell very close to the singly charged ion from MT3' (3,179 Da), which probes site H16189. The impact of primer impurity products can also be seen in exhibit 6. An examination of the extension product region from primer MT7/H00073 (about 6,200 Da) shows two peaks where only one was expected (top panel). The lower mass peak in the doublet is labeled as "+ddC" (6,192 Da), but the larger peak in the doublet was a primer impurity of MT4e/L00195 (6,232 Da). The mass difference between these two peaks was 40 Da or exactly what one would expect for a C/G heterozygote extension of primer MT7/H00073. Thus, to avoid a false positive, it was important to run the 10 primers alone as a negative control to verify any primer impurities. To aid development of this multiplex SNP assay, large quantities of PCR product were produced from K562 genomic DNA (enough for about 320 reactions) and were pooled together so that multiple experiments would have the same starting material. With the K562 amplicon pool, the impact of primer concentration variation was examined without worrying about the DNA template as a variable. The K562 amplicon pool was generated using the PCR primers noted in exhibit 26, which produced a 1,021 bp PCR product that spanned the entire D-loop region (Wilson et al., 1995). Thus, all 10 SNP sites could be examined from a single DNA template. Using ABI's standard sequencing procedures and dRhodamine dye-terminator sequencing kit, this PCR pool was sequenced to verify the identity of the nucleotide at each SNP site in the 10-plex. The sequencing primers were the same as those reported previously (Wilson et al., 1995). Identical results were obtained between the sequencing and the mass spectrometer, which verified the method (exhibit 6). A variety of primer combinations and primer concentrations were tested on the way to obtaining results with the 10-plex. For example, a 4-plex and a 6-plex were developed first with primers that were further apart in terms of mass and, therefore, could be more easily distinguished. An early 6-plex was published in Electrophoresis (Li et al., 1999). Primer concentrations were balanced empirically by first running all primers at 10 pmol and then raising or lowering the amount of primer in the next set to obtain a good balance between those in the multiplex primer mix. In general, a higher amount of primer was required for primers of higher mass. However, this trend did not always hold true, probably because ionization efficiencies in MALDI mass spectrometry differ depending on DNA sequence content. The primer concentrations in the final "optimized" 10-plex ranged from 10 pmol for MT3' (3,179 Da) to 35 pmol for MT7 (5,891 Da). Primer extension efficiencies also varied between primers, making optimization of these multiplexes rather challenging. Originally, this study set out to examine 100 samples, but due to the early termination of the project, researchers were unable to run this multiplex SNP assay across a panel of samples to verify that it worked with more than one sample. Future work could include examination of a set of population samples and correlation to DNA sequencing results. Examination of the impact of SNPs that are close to the one being tested and that might impact primer annealing also needs to be done. In addition, more SNP sites can be developed and the multiplex could be expanded to include a larger number of loci. Y-Chromosome Work While the mtDNA work produced an opportunity to examine the mass spectrometry factors in developing an SNP multiplex, this work involved only a single DNA template with multiple SNP probes. A more common situation for multiplex SNP development is multiple DNA templates with one or more SNP per template. SNP sites may not be closely spaced along the genome and could require unique primer pairs to amplify each section of DNA. To test this multiplex SNP situation, researchers investigated multiple SNPs scattered across the Y chromosome. Through a collaboration with Dr. Oefner and Dr. Underhill, 20 Y-chromosome SNP markers were examined in this study. Dr. Oefner and Dr. Underhill have identified almost 150 SNP loci on the Y chromosome, some of which have been reported in the literature (Underhill et al., 1997). By examining an initial set of 20 Y SNPs and adding additional markers as needed, researchers attempted to develop a final multiplex set based on about 50 Y SNP loci. The collaboration provided detailed sequence information on bases around the SNP sites (typically several hundred bases on either side of the SNP site), which is important for multiplex PCR primer design. Dr. Oefner also provided a set of 38 male genomic DNA samples from various populations around the world for testing purposes. The sequences were provided in two batches of 10 sequences each. In the first two sets, primer designs were attempted for a 9-plex PCR and a 17-plex PCR, respectively. Due to primer incompatibilities, it was impossible to incorporate all SNPs into each multiplex set. However, with a larger set of sequences to choose from, it is conceivable that much larger PCR multiplexes can be developed. According to Dr. Underhill's nomenclature, the first set of Y SNP markers included the following loci: M9 (C to G), M17 (1 bp deletion, 4Gs to 3Gs), M35 (G to C), M42 (A to T), M45 (G->A), M89 (C to T), M96 (G to C), M122 (T to C), M130 (C to T), and M145 (G to A). The second set of Y SNP markers contained these loci: M119 (A to C), M60 (1 bp insertion, a "T"), M55 (T to C), M20 (A to G), M69 (T to C), M67 (A to T), M3 (C to T), M13 (G to C), M2 (A to G), and M26 (G to A). Multiplex PCR primers were designed with a UNIX version of Primer 3 (release 0.6) (Rozen et al., 1998) that was adapted at GeneTrace by Nathan Hunt to utilize a mispriming library and Perl scripts for input and export of SNP sequences and primer information, respectively. The PCR primer sequences produced by Hunt's program are listed in exhibit 27. The universal tags attached to each primer sequence aid in multiplex compatibility (Shuber et al., 1995). This tag added 23 bases to the 5'-end of the forward primers and 24 bases to the 5'-end of the reverse primers and, therefore, increased the overall length of PCR products by 47 bp. The addition of the universal tag makes multiplex PCR development much easier and reduces the need to empirically adjust primer concentrations to balance PCR product quantities obtained from multiple loci (Ross et al., 1998). To compare the amplicon yields from various loci amplified in the multiplex PCR, the product sizes were selected to make them resolvable by CE separation. Thus, the PCR product sizes ranged from 148 bp to 333 bp (exhibit 67) using the primers listed in exhibit 27. To make sure that each primer pair worked, each marker was amplified individually as well as in the multiplex set using the same concentration of PCR primers. A substantial amount of primers remaining after PCR indicated that the PCR efficiency was lower for that particular marker (exhibit 68). Researchers were able to demonstrate male-specific PCR with a 17-plex set of PCR primers. The male test sample AM209 from an Amish CEPH family (exhibit 25) produced amplicons for 17 Y SNP loci, while K562 genomic DNA yielded no detectable PCR product because it is female DNA and, therefore, does not contain a Y chromosome (exhibit 7). SNP primers were designed and synthesized for probing the SNP sites either in a singleplex (exhibit 69) or a multiplex (exhibit 70) format. Some additional SNP primers and multiplex PCR primers were also designed for testing 12 autosomal SNPs throughout the human genome with the hope of comparing the informativeness of SNPs to STRs (exhibit 71). Analysis of these same 12 SNPs was recently demonstrated in a multiplex PCR and SNP assay by PerSeptive Biosystems (Ross et al., 1998). Optimal SNP markers for identity testing typically have allele frequencies of 30-70% in a particular human population. By way of comparison, highly polymorphic STRs can have 10-15 or more alleles with allele frequencies below 15% (i.e., more alleles and lower allele frequencies). The characteristics of STR and SNP markers are compared in exhibit 72. SNPs have the capability of being multiplexed to a much higher level than STRs; however, more SNP markers are required for the same level of discrimination compared with STRs. Only time will tell what role new SNP markers will have in human identity testing. --------------------------- References ABI Prism 310 Genetic Analyzer User's Manual. 1998. Foster City, CA: Applied Biosystems. AmpF1STR[R] COfiler[TM] PCR Amplification Kit User Bulletin. 1998. Foster City, CA: Applied Biosystems. AmpF1STR[R] Profiler Plus[TM] PCR Amplification Kit User's Manual. 1998. Foster City, CA: Applied Biosystems. Becker, C.H., J. Li, T.A. Shaler, J.M. Hunter, H. Lin, and J.A. Monforte. 1997. Genetic analysis of short tandem repeat loci by time-of-flight mass spectrometry. In Proceedings from the Seventh International Symposium on Human Identification 1996. Madison, WI: Promega Corporation, 158- 162. Becker, C.H. and S.E. Young. 1999. Mass spectrometer. U.S. Patent No. 5,864,137. Braun, A., D.P. Little, and H. Koster. 1997a. Detecting CFTR gene mutations by using primer oligo base extension and mass spectrometry. Clinical Chemistry 43:1151-1158. Braun, A., D.P. Little, D. Reuter, B. Muller-Mysok, and H. Koster. 1997b. Improved analysis of microsatellites using mass spectrometry. Genomics 46:18-23. Butler, J.M., J. Li, T.A. Shaler, J.A. Monforte, and C.H. Becker. 1998. Reliable genotyping of short tandem repeat loci without an allelic ladder using time-of-flight mass spectrometry. International Journal of Legal Medicine 112(1):45-49. Butler, J.M., B.R. McCord, J.M. Jung, J.A. Lee, B. Budowle, and R.O. Allen. 1995. Application of dual internal standards for precise sizing of polymerase chain reaction products using capillary electrophoresis. Electro- phoresis 16:974-980. Butler, J.M., J. Li, J.A. Monforte, and C.H. Becker. 2000. DNA typing by mass spectrometry with polymorphic DNA repeat markers. U.S. Patent No. 6,090,558. Carroll, J.A. and R.C. Beavis. 1996. Using matrix convolution filters to extract information from time-of-flight mass spectra. Rapid Communications in Mass Spectrometry 10:1683-1687. Clark, J.M. 1988. Novel non-templated nucleotide addition reactions catalysed by procaryotic and eucaryotic DNA polymerases. Nucleic Acids Research, 16(20):9677-9686. Collins, F.S., L.D. Brooks, and A. Chakravarti. 1998. A DNA polymorphism discovery resource for research on human genetic variation. Genome Research, 8(12):1229-1231. Fregeau, C.J. and R.M. Fourney. 1993. DNA typing with fluorescently tagged short tandem repeats: A sensitive and accurate approach to human identification. BioTechniques 15(1):100-119. GenePrint[TM] STR Systems Technical Manual. 1995. Madison, WI: Promega Corporation, Part# TMD004. Haff, L.A. and I.P Smirnov. 1997. Single-nucleotide polymorphism identification assays using a thermostable DNA polymerase and delayed extraction MALDI-TOF mass spectrometry. Genome Research 7(4):378- 388. Hammond, H.A., L. Jin, Y. Zhong, C.T. Caskey, and R. Chakraborty. 1994. Evaluation of 13 short tandem repeat loci for use in personal identification applications. American Journal of Human Genetics 55:175- 189. Kimpton, C.P., P. Gill, A. Walton, A. Urquhart, E.S. Millican, and M. Adams. 1993. Automated DNA profiling employing multiplex amplification of short tandem repeat loci. PCR Methods and Applications 3:13-22. Lazaruk, K., P.S. Walsh, F. Oaks, D. Gilbert, B.B. Rosenblum, S. Menchen, D. Scheibler, H.M. Wenz, C. Holt, and J. Wallin. 1998. Genotyping of forensic short tandem repeat (STR) systems based on sizing precision in a capillary electrophoresis instrument. Electrophoresis 19(1):86-93. Li, H., L. Schmidt, M.-H. Wei, T. Hustad, M.I. Lerman, B. Zbar, and K. Tory. 1993. Three tetranucleotide polymorphisms for loci: D3S1352, D3S1358, D3S1359. Human Molecular Genetics 2:1327. Li, J., J.M. Butler, Y. Tan, H. Lin, S. Royer, L. Ohler, T.A. Shaler, J.M. Hunter, D.J. Pollart, J.A. Monforte, and C.H. Becker. 1999. Single nucleotide polymorphism determination using primer extension and time-of-flight mass spectrometry. Electrophoresis 20(6):1258-1265. Lindqvist, A.-K.B., P.K.E. Magnusson, J. Balciuniene, C. Wadelius, E. Lindholm, M.E. Alarcon-Riquelme, and U.B. Gyllensten. 1996. Chromosome-specific panels of tri- and tetranucleotide microsatellite markers for multiplex fluorescent detection and automated genotyping: evaluation of their utility in pathology and forensics. Genome Research 6:1170-1176. MITOMAP: A Human Mitochondrial Genome Database. Center for Molecular Medicine, Emory University, Atlanta, GA, USA. http://www.gen.emory.edu/mitomap.html. 1999. Monforte, J.A., C.H. Becker, T.A. Shaler, and D.J. Pollart. 1997. Oligonucleotide sizing using immobilized cleavable primers. U.S. Patent No. 5,700,642. Monforte, J.A., T.A. Shaler, Y. Tan, and C.H. Becker. 1999. Methods of preparing nucleic acids for mass spectrometric analysis. U.S. Patent No. 5,965,363. Ross, P.L. and Belgrader, P. 1997 Analysis of short tandem repeat polymorphisms in human DNA by matrix-assisted laser desorption/ionization mass spectrometry. Analytical Chemistry 69(19):3966-3972. Ross, P., L. Hall, I. Smirnov, and L. Haff. 1998. High level multiplex genotyping of MALDI-TOF mass spectrometry. Nature Biotechnology 16:1347-1351. Rozen, S., and H.J. Skaletsky. 1998. Primer3. Code available at http://www. genome.wi.mit.edu/genome_software/other/primer3.html. Ruitberg, C.M., D.J. Reeder, and J.M. Butler. 2001. STRBase: A short tandem repeat DNA database for the human identity testing community. Nucleic Acids Research 29(1):320-322. Schumm, J.W., A.W. Lins, K.A. Micka, C.J. Sprecher, D.R. Rabbach, and J.W. Bacher. 1997. Automated fluorescent detection of STR multiplexes-- development of the GenePrint[TM] PowerPlex[TM] and FFFL multiplexes for forensic and paternity applications. In Proceedings from the Seventh International Symposium on Human Identification 1996. Madison, WI: Promega Corporation, 70-88. Shaler, T.A., J.N. Wickham, K.A. Sannes, K.J. Wu, and C.H. Becker. 1996. Effect of impurities on the matrix-assisted laser desorption mass spectra of single-stranded oligonucleotides. Analytical Chemistry 68(3):576-579. Shuber, A.P., V.J. Grondin, and K.W. Klinger. 1995. A simplified procedure for developing multiplex PCRs. Genome Research 5:488-493. Sullivan, K.M., A. Mannucci, C.P. Kimpton, and P. Gill. 1993. A rapid and quantitative DNA sex test: fluorescence-based PCR analysis of X-Y homologous gene amelogenin. BioTechniques 15:637-641. Taranenko, N.I., V.V. Golovlev, S.L. Allman, N.V. Taranenko, C.H. Chen, J. Hong, and L.Y. Chang. 1998. Matrix-assisted laser desorption/ionization for short tandem repeat loci. Rapid Communications in Mass Spectrometry 12(8):413-418. Tully, G., K.M. Sullivan, P. Nixon, R.E. Stones, and P. Gill. 1996. Rapid detection of mitochondrial sequence polymorphisms using multiplex solid-phase fluorescent minisequencing. Genomics 34(1):107-113. Underhill, P.A., L. Jin, A.A. Lin, S.Q. Mehdi, T. Jenkins, D. Vollrath, R.W. Davis, L.L. Cavalli-Sforza, and P.J. Oefner. 1997. Detection of numerous Y chromosome biallelic polymorphisms by denaturing high-performance liquid chromatography. Genome Research 7(10):996- 1005. Vestal, M.L., P. Juhasz, and S.A. Martin. 1995. Delayed extraction matrix-assisted laser desorption time-of-flight mass spectrometry. Rapid Communications in Mass Spectrometry 9:1044-1050. Walsh, P.S., N.J. Fildes, and R. Reynolds. 1996. Sequence analysis and characterization of stutter products at the tetranucleotide repeat locus vWA. Nucleic Acids Research 24(14): 2807-2812. Wilson, M.R., D. Polanskey, J.M. Butler, J.A. DiZinno, J. Replogle, and B. Budowle. 1995. Extraction, PCR amplification, and sequencing of mitochondrial DNA from human hair shafts. BioTechniques 18:662-669. Wu, K.J., T.A. Shaler, and C.H. Becker. 1994. Time-of-flight mass spectrometry of underivatized single-stranded DNA oligomers by matrix-assisted laser desorption. Analytical Chemistry 66:1637-1645. Wu, K.J., A. Steding, and C.H. Becker. 1993. Matrix-assisted laser desorption time-of-flight mass spectrometry of oligonucleotides using 3-hydroxypicolinic acid as an ultraviolet-sensitive matrix. Rapid Communications in Mass Spectrometry 7:142-146. --------------------------- Published Papers and Presentations From 1997 to 1999, six publications resulted from the work funded by NIJ, and at least one more manuscript is in preparation. All articles were published in journals or for conference proceedings that are accessible and frequented by forensic DNA scientists to ensure proper dissemination of the information. Butler, J.M., J. Li, T.A. Shaler, J.A. Monforte, and C.H. Becker. 1998. Reliable genotyping of short tandem repeat loci without an allelic ladder using time-of-flight mass spectrometry. International Journal of Legal Medicine 112 (1): 45-49. Butler, J.M., J. Li, J.A. Monforte, C.H. Becker, and S. Lee. 1998. Rapid and automated analysis of short tandem repeat loci using time-of-flight mass spectrometry. In Proceedings of the Eighth International Symposium on Human Identification 1997. Madison, WI: Promega Corporation, 94- 101. Butler, J.M., K.M. Stephens, J.A. Monforte, and C.H. Becker. 1999. High-throughput STR analysis by time-of-flight mass spectrometry. In Proceedings of the Second European Symposium on Human Identification 1998. Madison, WI: Promega Corporation, 121-130. Butler, J.M., and C.H. Becker. 1999. High-throughput genotyping of forensic STR and SNP loci using time-of-flight mass spectrometry. In Proceedings of the Ninth International Symposium on Human Identification 1998. Madison, WI: Promega Corporation, 43-51. Li, J., J.M. Butler, Y. Tan, H. Lin, S. Royer, L. Ohler, T.A. Shaler, J.M. Hunter, D.J. Pollart, J.A. Monforte, and C.H. Becker. 1999. Single nucleotide polymorphism determination using primer extension and time-of-flight mass spectrometry. Electrophoresis 20:1258-1265. Butler, J.M. 1999. STR analysis by time-of-flight mass spectrometry. Profiles in DNA 2(3): 3-6. In addition, one patent was submitted based on work funded by NIJ. This patent describes the PCR primer sequences used to generate smaller amplicons for 33 different STR loci along with representative mass spectrometry results. The sequences for multiple cleavable primers are also described, although this proprietary chemistry is the subject of U.S. Patent 5,700,642, which was issued in December 1997. The process of multiplexing STR loci by interleaving the alleles on a compressed mass scale is also claimed by U.S. Patent 6,090,558 (Butler et al., 2000). During the course of this NIJ grant, research findings were presented to the forensic DNA community at the following scientific meetings: o Eighth International Symposium on Human Identification (September 20, 1997) o San Diego Conference, Nucleic Acid Technology: The Cutting Edge of Discovery (November 7, 1997) o NIJ Research Committee (February 8, 1998) o American Academy of Forensic Sciences (February 13, 1998) o Southwest Association of Forensic Scientists DNA training workshop (April 23, 1998) o California Association of Criminalists DNA training workshop (May 6, 1998) o National Conference on the Future of DNA (May 22, 1998) o Florida DNA Training Session (May 22, 1998) o American Society of Mass Spectrometry (June 4, 1998) o Second European Symposium on Human Identification (June 12, 1998) o IBC DNA Forensics Meeting (July 31, 1998) o Ninth International Symposium on Human Identification (October 8, 1998) o Fourth Annual CODIS User's Group Meeting (November 20, 1998) o NIJ Research Committee (February 15, 1999) In addition, the authors participated in NIJ's "Technology Saves Lives" Technology Fair on Capitol Hill in Washington, D.C., March 30-31, 1998, an event that provided excellent exposure for NIJ to Congress. Here, one of the DNA sample preparation robots was demonstrated in the lobby of the Rayburn Building. In September 1998, an 11-minute video was also prepared to illustrate some of the advantages of mass spectrometry for high-throughput DNA typing. --------------------------- About the National Institute of Justice NIJ is the research and development agency of the U.S. Department of Justice and is the only Federal agency solely dedicated to researching crime control and justice issues. NIJ provides objective, independent, nonpartisan, evidence-based knowledge and tools to meet the challenges of crime and justice, particularly at the State and local levels. NIJ's principal authorities are derived from the Omnibus Crime Control and Safe Streets Act of 1968, as amended (42 U.S.C. sections 3721-3722). NIJ's Mission In partnership with others, NIJ's mission is to prevent and reduce crime, improve law enforcement and the administration of justice, and promote public safety. By applying the disciplines of the social and physical sciences, NIJ-- o Researches the nature and impact of crime and delinquency. o Develops applied technologies, standards, and tools for criminal justice practitioners. o Evaluates existing programs and responses to crime. o Tests innovative concepts and program models in the field. o Assists policymakers, program partners, and justice agencies. o Disseminates knowledge to many audiences. NIJ's Strategic Direction and Program Areas NIJ is committed to five challenges as part of its strategic plan: 1) rethinking justice and the processes that create just communities; 2) understanding the nexus between social conditions and crime; 3) breaking the cycle of crime by testing research-based interventions; 4) creating the tools and technologies that meet the needs of practitioners; and 5) expanding horizons through interdisciplinary and international perspectives. In addressing these strategic challenges, the Institute is involved in the following program areas: crime control and prevention, drugs and crime, justice systems and offender behavior, violence and victimization, communications and information technologies, critical incident response, investigative and forensic sciences (including DNA), less-than-lethal technologies, officer protection, education and training technologies, testing and standards, technology assistance to law enforcement and corrections agencies, field testing of promising programs, and international crime control. NIJ communicates its findings through conferences and print and electronic media. NIJ's Structure The NIJ Director is appointed by the President and confirmed by the Senate. The NIJ Director establishes the Institute's objectives, guided by the priorities of the Office of Justice Programs, the U.S. Department of Justice, and the needs of the field. NIJ actively solicits the views of criminal justice and other professionals and researchers to inform its search for the knowledge and tools to guide policy and practice. NIJ has three operating units. The Office of Research and Evaluation manages social science research and evaluation and crime mapping research. The Office of Science and Technology manages technology research and development, standards development, and technology assistance to State and local law enforcement and corrections agencies. The Office of Development and Communications manages field tests of model programs, international research, and knowledge dissemination programs. NIJ is a component of the Office of Justice Programs, which also includes the Bureau of Justice Assistance, the Bureau of Justice Statistics, the Office of Juvenile Justice and Delinquency Prevention, and the Office for Victims of Crime. To find out more about the National Institute of Justice, please contact: National Criminal Justice Reference Service P.O. Box 6000 Rockville, MD 20849-6000 800-851-3420 e-mail: askncjrs@ncjrs.org To obtain an electronic version of this document, access the NIJ Web site (http://www.ojp.usdoj.gov/nij). If you have questions, call or e-mail NCJRS.