Department of Genetics, Trinity College, University of Dublin, Dublin 2, IRELAND.
The analysis of large volumes of genomic data generates special computational needs. A Beowulf-type computer cluster was set up for high-performance computing. Improvements over existing tools for the efficient parallelisation of similarity searches on such systems were accomplished with the program rapaquee.
To investigate the evolution of genomes on a molecular basis, a method for the detection of paralogous blocks was developed. Application to the yeasy Saccharomyces cerevisiae showed that fully automatic generated results approximated previously available, manuay edited information very well. An improved method for the graphical presentation of duplicated regions was implemented which can be used through the World Wide Web and allows the highly exploration of many aspects of the produced results.
Sequence and mapping data from the public Human Genome project was subjected to intra-genomic comparision. Previously reported and new paralogous regions of statistically significant sizes were detected. A new resource for the interactive graphical presentation of these blocks at variable levels of resolution was implemented. Further phylogenetic analyses indicated that they contain an excess of gene pairs created in a burst of duplication activity that took place approximately 333-583 Mya, spanning the estimated time of the origin of vertebrates (about 500 Mya).
Tests with other genomes prove the benefits of the graphical presentation and its possible adaptation to inter-genomic comparisions. The flexibility and modularity of the approach warrant its usability for future projects.