Bottom-upproteomics database search algorithms used for peptideidentification cannot comprehensively identify post-translationalmodifications (PTMs) in a single-pass because of high false discoveryrates (FDRs). A new approach to database searching enables globalPTM (G-PTM) identification by exclusively looking for curated PTMs,thereby avoiding the FDR penalty experienced during conventional variablemodification searches. We identified over 2200 unique, high-confidencemodified peptides comprising 26 different PTM types in a single-passdatabase search. IntroductionProtein post-translationalmodifications (PTMs) play essential rolesin protein signaling, localization, function, degradation, and other important biologicalprocesses. Despite the considerable success of protein identificationvia liquid chromatography–mass spectrometry (LC–MS),most studies do not provide information regarding PTMs on the identifiedproteins. This is because identifying even a single type of PTM requiresspecialized procedures and introduces problems with the increaseddatabase size required for the search.
Lane 1 is 10% input. Western blot analysis was performed using DLAT (4A4-B6-C10) Mouse mAb. Confocal immunofluorescent analysis of HeLa cells using DLAT (4A4-B6-C10) Mouse mAb (green). Actin filaments were labeled with DY-554 phalloidin (red). PUMAα Antibody (B-6) is a mouse monoclonal IgG 1 (kappa light chain) provided at 200 µg/ml specific for an epitope mapping between amino acids 2-29 at the N-terminus of PUMAα of human origin recommended for detection of PUMAα of mouse, rat and human origin by WB, IP, IF, IHC(P) and ELISA; also reactive with additional species, including.
Extension of such approaches to analysis of multiple PTMs is generallyunrealistic, and consequently, there are few examples where multipletypes of PTMs are analyzed in a single experiment., Despitewidespread recognition of the importance of PTMs, nearly all database search algorithms still rely solelyon primary sequence information and ignore all prior knowledge regardingthe presence of PTMs. Two notable exceptions are the ProSight softwarefor top-down proteomics, which performs shotgun annotation of multiple sources of variation contained withinthe UniProt repository, and X!Tandem, which can make use of annotatedPTMs from Swiss-Prot. Protein Search DatabasesProtein FASTA and XML fileswere obtained from the UniProt repository. We chose to use a subset of the available human protein sequencesfor the Jurkat sample analyses, including only those with a matchingmRNA transcript above 0.1 transcripts per million (TPM).
We used the complete set of available mouseprotein sequences for the B6 and CAST islet proteomics. Both setsof protein accession numbers (14 138 entries for Jurkat and43 238 entries for mouse) included in the databases are suppliedin, and the summarizedlist of target PTM types from the XML file is in.
The database used for the Jurkat sampleswas the Homo sapiens (Human) reference proteome fromUniProt release 201312 (downloaded February 25, 2015), limited tothose proteins with mRNA transcript abundances exceeding 0.1 TPM. The database used for the mouse samples wasthe Mus musculus (Mouse) reference proteome fromUniProt release 201502 (downloaded February 25, 2015). Database SearchingThe software program Morpheus (revision142) was used for all database searching. It can be obtained at. For this work, it wasmodified to accept UniProt XML in addition to FASTA protein databases.When a UniProt XML database is specified, all curated modificationsare extracted. Details of each modification (name, mass shift, target)are read from a local copy of.
All valid modifications are added to the variable modificationsbox in the Morpheus graphical user interface with the prefix “UniProt”and selected by default. During each search, all protein sequencesare read along with the locations of selected UniProt variable modifications.The order of the unmodified and modified amino acid residues is reversedduring on-the-fly generation of decoy protein sequences. PTMs movewith their companion amino acid. For example, the phosphorylated tetrapeptide,TES(UniProt: Phosphoserine)Q, becomes QS(UniProt: Phosphoserine)ET.A critical routine in the code takes a base peptide sequence andgenerates all of the isoform combinations possible given the variablemodifications selected, up to a user-defined limit (1024 by default).This code was modified slightly to consider UniProt modificationsonly at their location as reported in the database. Otherwise, thelogic for combinatorially generating all possible peptide isoformsis identical.Searches were performed on a Dell Precision T7610workstation witha Xeon 2.70 GHz processor and 32.0 GB of RAM using 12 cores.
The followingsettings were used in all searches: Protease = trypsin (no prolinerule); Maximum Missed Cleavages = 2; Initiator Methionine Behavior= variable; Fixed Modifications = carbamidomethylation of C; VariableModifications = oxidation of M; Maximum Variable Modification IsoformsPer Peptide = 1024; Precursor Mass Tolerance = ±10.0 ppm (monoisotopic);Precursor Monoisotopic Peak Correction = disabled; Product Mass Tolerance= ±0.01 Da (monoisotopic); MaximumFalse Discovery Rate = 1%. FASTA vP searches used additionally variablephosphorylation of S, T, and Y. G-PTM searches used XML database filesrather than FASTA files. Counts of post-translationally modified peptidesdo not include the oxidation of methionine or the carbamidomethylationof cysteine as these occur during sample handling and therefore aresomewhat uninteresting. Search results are available via FTP usingthe aforementioned PeptideAtlas data repository hyperlinks. Summarylists of identified proteins including numbers of PSMs (total andmodified) are provided in for both Jurkat and mouse, where B6 and CAST mouse data are segregatedto allow for their comparison. Total search times are as follows:Jurkat FASTA, 46 min.; Jurkat G-PTM, 30 min.; Jurkat variable phosphorylation,367 min.; B6 and CAST mouse FASTA, 63 min.; B6 and CAST mouse G-PTM,62 min.; B6 and CAST mouse variable phosphorylation, 471 min.
![]()
Resultsand DiscussionThe G-PTM search strategy limits the expansionof the target anddecoy databases, as shown by the sizes of the circles in and by the datain. Despite including22 540 site-specific human PTMs from 104 different PTM types,the search space for G-PTM increased by only 10% compared to the FASTAsearch, which did not consider any PTMs. In contrast, the FASTA vPsearch, with only three variable modifications (phosphorylation ofserine, threonine, and tyrosine) increased the search space by 67-fold(see ). Thenumber of variable modifications per peptide isoforms was limitedto 1024 for this comparison, which is the value often allowed in databasesearches of this type. This massive, 67-fold expansion of the databasefor variable phosphorylation substantially increases both the searchtime (6- to 10-fold) and the error rate for phosphopeptide identification.We applied the G-PTM search strategy to a large proteomic dataset obtained from human Jurkat cells. About 490 000 tandemmass spectra were obtained from a highly fractionated sample of Jurkatcell lysate, and counts of the modified peptides resulting from theG-PTM search are listed in. Within this single-pass database search, over 2200unique post-translationally modified peptides were identified, encompassing26 different types of PTMs from five categories (phosphorylation,methylation, acetylation, hydroxylation, and assorted).
These modifiedpeptides would have gone undetected using a typical proteomics databasesearch, but the G-PTM search readily revealed this rich array of PTMspresent in the Jurkat cells. We also applied the G-PTM search strategyto proteomics data setsfrom mouse pancreatic islets. About 430 000 tandem mass spectrawere obtained from a highly fractionated samples of CAST and B6 mouseislet lysate, and the counts of modified peptides resulting from theG-PTM search are listed in. Within this single-pass database search, ∼1100 uniquepost-translationally modified peptides were identified, encompassing32 different types of PTMs from five categories (phosphorylation,methylation, acetylation, hydroxylation, and assorted). These modifiedpeptides would also have gone undetected using a typical proteomicsdatabase search, but the G-PTM search readily revealed this rich arrayof PTMs present in the mouse islets.These same tandem massspectra were searched using Morpheus withthe two other search strategies, FASTA and FASTA vP. The results forthe three types of searches, for both human and mouse data, are shownin.
FDR iscalculated with the target–decoy approach. The tabular datawithin the figure lists the total number of identified proteins, uniquepeptides, and peptide spectral matches (PSMs). The “All”columns (orange) include both unmodified and post-translationallymodified peptides. These total identifications show similar resultsacross the three search strategies, although the G-PTM search consistentlyproduced more identifications than either the FASTA or FASTA vP searches.As expected for samples such as these without a PTM enrichment step,the majority of identified peptides are unmodified (100%, 97%, and97% for the FASTA, FASTA vP, and G-PTM search results, respectively;see, which categorizesthe identifications by the number of PTMs per peptide). Nonetheless,there are hundreds of modified peptides identified in these samples;see (purpledata columns).
The G-PTM search produces substantiallymore modified peptide spectral matches than the FASTA vP search, andfurthermore, these PTM identifications are of much higher confidence( vide infra). Database search results and measures oftheir confidences for PTMpeptide assignments. Results are shown for the three search typesfor both human and mouse proteomics data sets. The tabulated resultsare given for “All” (unmodified and modified peptides)and for “Modified” only peptides. The FDR for the “Modified”peptides was calculated from the numbers of decoy and target identificationsmeeting the global 1% FDR cutoff (i.e., the FDR for “All”peptide identifications). The FDR values for FASTA vP are 1%,indicatingsubstantially poorer confidence, whereas the FDRs are.
We thank Donnie Stapleton for supplying the mouse islet samples,Bosco Ho for providing the software Peptagram for spectrum annotation,and Peter Baker for uploading the data to MS-Viewer. This work wassupported by National Institutes of Health (NIH) Grant Nos. P50HG004952,U54DK093467, P01GM081629, and R01GM103315.
Was supported bythe NIH Genomic Sciences Training Program T32HG002760. Were supported by National Institute of Diabetes and DigestiveKidney Diseases Grant Nos.
R01DK058037, R24DK091207, and R01DK066369. Author ContributionsM.R.S.andC.D.W. Contributed equally to this work. Conceived of the conceptof using curated PTMs in database search, performed all database searchanalyses, and processed the results for publication. Implementedthe concept in the Morpheus search software.
Helpedin the design of the mouse studies and designed protocols for isletisolation and lysate preparation. Performed experimental samplepreparation and MS analyses of the mouse samples. Performedexperimental sample preparation, RNA-seq, and MS analyses of the humanJurkat cell samples. Directed the entire project.
M.R.S., C.D.W.,B.L.F., and M.S. Wrote the manuscript with support from all authors. Doerr A.Making PTMsa priority. Methods2012, 9, 862–310.1038/nmeth.2153. Deribe Y. L.; Pawson T.; Dikic I.Post-translationalmodificationsin signal integration. Biol.2010, 17, 666–7210.1038/nsmb.1842.
Sirover M. A.Subcellulardynamics of multifunctional protein regulation: mechanisms of GAPDHintracellular translocation. Biochem.2012, 113, 2193–/jcb.24113. vander Steen T.; Tindall D. J.; Huang H.Posttranslational modificationof the androgen receptor in prostate cancer.
Sci.2013, 14,.3390/ijms140714833. Gould N.; Doulias P. T.; Tenopoulou M.; Raju K.; Ischiropoulos H.Regulationof protein function and signaling by reversible cysteine S-nitrosylation. Chem.2013, 288, 26473–910.1074/jbc.R113.460261.
Cousin C.; Derouiche A.; Shi L.; Pagot Y.; Poncet S.; Mijakovic I.Protein-serine/threonine/tyrosine kinases in bacterialsignaling and regulation. FEMS Microbiol. Lett.2013, 346, 11–910.1111/1574-6968.12189. Ahner A.; Gong X.; Frizzell R. A.Cysticfibrosis transmembrane conductanceregulator degradation: cross-talk between the ubiquitylation and SUMOylationpathways. FEBS J.2013, 280, 4430–810.1111/febs.12415. Zhao Y.; Jensen O.
N.Modification-specificproteomics: strategies for characterizationof post-translational modifications using enrichment techniques. Proteomics2009, 9, 4632–4110.1002/pmic.200900398. Seo J.; Jeong J.; Kim Y. M.; Hwang N.; Paek E.; Lee K.
J.Strategy for comprehensive identification of post-translationalmodifications in cellular proteins, including low abundant modifications:application to glyceraldehyde-3-phosphate dehydrogenase. Proteome Res.2008, 7, 587–/pr700657y. Olsen J. V.; Mann M.Status of large-scale analysis of post-translational modificationsby mass spectrometry. Proteomics2013, 12, 3444–5210.1074/mcp.O113.034181. Pesavento J.
B.; Taylor G. K.; Kelleher N. L.Shotgun annotationof histone modifications: a new approach for streamlined characterizationof proteins by top down mass spectrometry. Soc.2004, 126, 3386–710.1021/ja039748i. Roth M. J.; Forbes A.
B.; Robinson D. E.; Kelleher N. L.Precise and parallel characterizationof coding polymorphisms, alternative splicing, and modifications inhuman proteins by mass spectrometry. Cell.Proteomics2005, 4, 1002–810.1074/mcp.M500064-MCP200. Fenyo D.; Beavis R. C.A method for assessing the statistical significanceof mass spectrometry-based protein identifications using general scoringschemes.
Chem.2003, 75, 768–/ac0258709. Craig R.; Beavis R. C.A method for reducingthe time required to match proteinsequences with tandem mass spectra. Rapid Commun.Mass Spectrom.2003, 17, 2310–610.1002/rcm.1198. Gupta N.; Bandeira N.; Keich U.; Pevzner P. A.Target-decoy approachand false discovery rate: when things may go wrong. Mass Spectrom.2011, 22, 1111–2010.1007/s13361-011-0139-3.
Chick J. M.; Kolippakkam D.; Nusinow D. P.; Zhai B.; Rad R.; Huttlin E. P.A mass-tolerant database search identifiesa large proportion of unassigned spectra in shotgun proteomics asmodified peptides. Biotechnol.2015, 33, 743–910.1038/nbt.3267. Na S.; Bandeira N.; Paek E.Fast Multi-blindModification Searchthrough Tandem Mass Spectrometry. Cell.Proteomics2012, 11, M111.0101/mcp.M111.010199.
Ma C. M.; Lam H.Hunting for unexpectedpost-translational modificationsby spectral library searching with tier-wise scoring. Proteome Res.2014, 13, 2262–2/pr401006g. Wenger C.
J.A proteomics search algorithm specifically designedfor high-resolution tandem mass spectra. J.Proteome Res.2013, 12, 1377–8610.1021/pr301024c. Savitski M.M.; Mathieson T.; Becher I.; Bantscheff M.H-score, amass accuracy driven rescoring approach for improved peptide identificationin modification rich samples. Proteome Res.2010, 9, 5511–610.1021/pr1006813. Sheynkman G.
M.; Shortreed M. M.Discovery and massspectrometric analysis of novel splice-junction peptides using RNA-Seq. Proteomics2013, 12, 2341–5310.1074/mcp.O113.028142. Desiere F.; Deutsch E. L.; Nesvizhskii A.
I.; Mallick P.; Eng J.; Chen S.; Eddes J.; Loevenich S. N.; Aebersold R.The PeptideAtlas project. Nucleic Acids Res.2006, 34, D655–810.1093/nar/gkj040. Rabaglia M.
E.; Gray-Keller M. L.; Shortreed M. D.Alpha-Ketoisocaproate-induced hypersecretionof insulin by islets from diabetes-susceptible mice. Metab.2005, 289, E218–2410.1152/ajpendo. Wisniewski J. R.; Zougman A.; Nagaraj N.; Mann M.Universal sample preparationmethod for proteome analysis.
![]()
Methods2009, 6, 359–6210.1038/nmeth.1322. Gilar M.; Olivova P.; Daly A. E.; Gebler J. C.Orthogonality ofseparation in two-dimensional liquid chromatography. Chem.2005, 77, 6426–3410.1021/ac050923i. Consortium U.Activitiesat the Universal Protein Resource (UniProt). Nucleic Acids Res.2014, 42, D191–810.1093/nar/gkt1140.
Zubarev R.; Mann M.On the proper use of mass accuracy in proteomics. Proteomics2006, 6, 377–/mcp.M600380-MCP200. Kall L.; Storey J. D.; MacCoss M. S.Posteriorerrorprobabilities and false discovery rates: two sides of the same coin. Proteome Res.2008, 7, 40–410.1021/pr700739d. Marx H.; Lemeer S.; Schliep J.
E.; Matheron L.; Mohammed S.; Cox J.; Mann M.; Heck A. J.; Kuster B.A large synthetic peptideand phosphopeptide reference library for mass spectrometry-based proteomics. Biotechnol.2013, 31, 557–6410.1038/nbt.2585.
Comments are closed.
|
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |