Thesica.org, the #1 open access web portal for PhD theses...

Why PhD theses...

PhD thesis is the result of years of hard work.

keyword researchMeasured by download count PhD theses are one of the most popular items world wide on open access repositories. But unless a thesis is published, it is very difficult for other researchers to find out about it and get access to it. Theses are often under-used by other researchers. Thesica.org attempts to address this issue by making it easy to identify and locate copies of many theses in various disciplines.

Computer methods for identifying significant features in protein sequences

Computer methods for identifying significant features in protein sequences
Timothy Andrew Lodge

1994

Department of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UNITED KINGDOM.

ABSTRACT

The research described in this thesis can be easily and conveniently separated under two broad headings. the definition of discriminating motif sets for protein families and software development. In this instance the phrase motif set refers to a combination of features in the amino acid sequences of a family of proteins that is diagnostic of family membership and therefore has predictive value in identifying new family members.

Under the first heading. a number of sets of motifs are described in detail while a number of others are included as an appendix in a format compatible with the PRINTS motif database. All these studies involved the multiple alignment of protein sequences extracted from the database and the use of database scanning techniques. From these motif sets it has been possible to identify new members of protein families and they may also supply valuable information for the exploration of the possible function and structure of the protein families.

A number of sequence analysis software packages are also described. They include both novel software and also the reworking of old algorithms with additions to make them more efficient, more useful for modern requirements and to fix existing problems. In the former category. new sequence alignment programs have been developed which integrate structural information (if any is available) with sequence and physicochemical properties. A number of programs are also discussed that allow the display and manipulation of a variety of sequence parameters, such as hydropathy and positional variability, which are very useful tools for motif definition. All these programs are written in C and the majority make use of the X/Motif programming libraries. where appropriate and are available on a variety of different hardware platforms.

The ADSP system has also been rewritten to make it more efficient and it has been ported to the UNIX operating system to make it more accessible to a larger number of users.