The proteome can be viewed as the total content and state of protein in a living system. The proteome is dynamic, responding to changes in the cellular environment caused by normal and pathogenic physiological signals associated with growth and development, cell division, immunological response, reproduction, bioenergetic state, and other cellular functions. The sequencing of the human genome was a significant benchmark in scientific endeavor. From this genetic archive, the regulation of transcription, translation, processing, and post-translational modifications offer many levels of control that ultimately result in the state of the cellular proteome that actually conducts the business of life. Work in the Mass Spectrometry facility centers on applying the tools of proteomics to reveal the dynamic qualitative and quantitative state of protein associated with cellular functions. The maintenance and operation of mass spectrometers is central to this work, and as Director of the Biological Mass Spectrometry and Proteomics Facility, my goal is to make this technology accessible, understandable, and above all, useful to the UMKC educational and research community.
My own research, however, is currently focused on the problem of identifying proteins in very divergent organisms. In proteomics facilities worldwide, protein identification is routinely accomplished by comparing mass spectrometry data to predicted, virtual spectra, based on the known protein sequences held in protein databases. MS/MS data is particularly unique and powerful, because it holds not only peptide mass information, but also peptide fragment information. Peptides have fairly predictable fragmentation behavior in mass spectrometers, and a simple comparison efficiently finds matches in the MS data, if represented in the protein database. It is an ideal bioinformatics problem for computer automated searching. Obviously, success based on this paradigm requires that the MS data of peptides from the unknown protein is represented in the database, which is basically true for commonly studied species. In fact, even moderate sequence divergence, like polymorphism in some peptides of a protein is typically compensated for by the presence of other conserved peptides in the same protein, allowing the protein identification to be made. More extreme differences, like those encountered in very divergent species however, can severely diminish confidence in protein identifications by reducing the number of matching peptides available. Interestingly, we have found that we can correctly predict the sequence of naturally occurring proteins based on well documented substitution patterns exhibited by known proteins. Using this strategy, we have designed a web-based virtual sequence database engineering program that successfully predicts naturally occurring protein sequences that are not represented in the protein databases, but are present and identifiable in mass spectrometry data (http://ms-virtmorph.umkc.edu/). We are developing this technology to improve protein identification in divergent proteins that are not present in the existing database.