Thesica.org, the #1 open access web portal for PhD theses...

Why PhD theses...

PhD thesis is the result of years of hard work.

keyword researchMeasured by download count PhD theses are one of the most popular items world wide on open access repositories. But unless a thesis is published, it is very difficult for other researchers to find out about it and get access to it. Theses are often under-used by other researchers. Thesica.org attempts to address this issue by making it easy to identify and locate copies of many theses in various disciplines.

Picking the high hanging fruit: Automated ways to annotate awkward genes

Picking the high hanging fruit: Automated ways to annotate awkward genes
SEÁN Ó HéIGEARTAIGH

2011

Department of Genetics, Trinity College, University of Dublin, Dublin 2, IRELAND.

ABSTRACT

In Chapter 2 I describe the development of software called SearchDOGS (Database of Orthologous Genomic Segments). By identifying regions of conserved local synteny across species using the synteny information contained in the Yeast Gene Order Browser (YGOB) and combining this information with standart BLAST sequence similarity searches, SearchDOGS is able to identify unannotated genes in published yeast genomes with a very high degree of sensitivity. It is particularly effective for identifying short or highly diverged genes that are often missed using standard methods. Using this approach, we have identified 595 unannotated genes across eleven yeast species, incuding two previously unidentified genes in S. cerevisiae. Among these, we identify a number of genes coding for the mating pheromone a-factor in six species including Kluyveromyces lactis; these tiny genes are notoriously difficult to identify by standard methods.

In Chapter 3 I describe the adaptation of SearchDOGS to identify missing genes in bacterial genomes. Bacterial SearchDOGS is a standalone, downloadable package that can be used in conjunction with any set of bacterial genomes that span a suitable evolutionary range, including unpublished or private data. The software automatically generates a pillar homology structure between the genomes in order to calculate the synteny information that is central to the SearchDOGS procedure. HTML results files are generated for each species, including BLAST links, Ka/Ks protein sequence conservation estimates and other relevant information for each candidate gene identified, in order to allow the user to make an informed decision regarding the validity of each candidate gene. Using this approach, I identified 171 gene candidates in the Shigella boydii sb227 genome, including 62 candidates of length

In Chapter 4 I undertake a comparative analysis in the Saccharomycetaceae of another type of “awkward gene” that is difficult to annotate and sometimes poorly understood: genes that undergo programmed ribosomal frameshifting. I expand on previous studies of three yeast chromosomal genes, OAZ1, EST3 and ABP140, that were previously known to contain a programmed frameshifting signal. I describe a further example of unusual gene evolution, URA6, that may be a case of a gene split or a programmed ribosomal frameshift. In the case of ABP140, I identify previously unidentified cases of retention of truncated ohnologs following whole genome duplication. The URA6 locus is particularly notable as it appears to require an unlikely number of events to produce the distribution of full-length and split/frameshifted orthologs regardless of whether this is an example gene split or frameshifting locus.

Appendix II includes the manuscript “Evolutionary erosion of yeast sex chromosomes by mating-type switching accidents” by Jonathan L. Gordon, David Armisén, Estelle Proux-Wéra, Seán S. ÓhÉigeartaigh, Kevin P. Byrne, and Kenneth H. Wolfe, recently accepted for publication by the Proceedings of the National Academy of Science”. In a project involving multiple members of the laboratory, we annotated the genomes of seven yeast species that we sequenced, and studied the evolution of the mating-type locus in these species. My role in this project was in the editing and correction of sequence data for the new Saccharomycetaceae family species that were included in this study (soon to be publicly available on YGOB). Also, SearchDOGS was included as an annotation step in the Yeast Genome Annotation Pipeline (YGAP) that was used to annotate these genomes.