A major effort of my work has been to create a relational database of genomic sequences and associated information. This includes expression information, divergence information, protein function information, and positional information. I envision several types of research dealing directly with the techniques and programs built to handle and query the data, but more importantly, observations made using this tool will lead to hypothesis testing experiments performed at the bench. I am interested in developing the informatics structure necessary to allow for the incorporation of many other types of research data, including protein structure, pathway information, and disease linkage information.
A bioinformatics approach allows for the analysis of large scale genomic differences between species and comparison to polymorphism data within species. I am utilizing these analyses to understand the processes affecting genes and genomes during evolution, including analyses of physico-chemical properties of individual amino acid changes in sets of genes. While constraint and random genetic drift are the primary forces acting on the evolution of gene sequences, positive selection does happen at the molecular level and can play a significant role in the development of specific gene sequences. Large scale approaches help us to quantify the nature of positive and negative selection both within and between species.
Currently, I am involved in bioinformatically categorizing and annotating genes from a major macaque brain cDNA sequencing effort involving research groups from both Japan and the United States. This work is in parallel with my ongoing database design work and my interest in rapidly evolving genes in primates.