<html>


<!-- Mirrored from www.jenner.ac.uk/YBF/coin.htm by HTTrack Website Copier/3.x [XR&CO'2003], Fri, 25 Jun 2004 08:29:10 GMT -->
<head>
<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
<meta name="ProgId" content="FrontPage.Editor.Document">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
<meta http-equiv="Content-Language" content="en-us">
<title>New Page 1</title>
<style>
<!--
h1
	{margin-bottom:.0001pt;
	page-break-after:avoid;
	font-size:12.0pt;
	font-family:"Times New Roman";
	font-weight:normal; margin-left:0cm; margin-right:0cm; margin-top:0cm}
-->
</style>
</head>

<body>

<div align="center">
  <center>
  <table border="0" cellpadding="0" cellspacing="8" width="98%">
    <tr>
      <td align="right" valign="top" width="20%">&nbsp; </td>
      <td width="15"></td>
      <td bgcolor="lightblue" valign="bottom" width="80%">
      <p align="center">
<font SIZE="5" face="Times New Roman">
<b>Using context and phylogeny to extend <br> HMM methods for protein domains</b></font></td>
    </tr>
    <tr>
      <td bgcolor="lightblue" valign="top" width="20%">
<font FACE="Times New Roman">
<h1 align="center" style="text-align:center"><b><span lang="EN-GB">
<font size="4">Lachlan Coin</font></span></b><font size="4">&nbsp;</font></h1>
<p ALIGN="LEFT">
<b>
Supervisor:</b> Dr Richard Durbin</p>
<b>
<p>School:</b> Wellcome Trust Sanger Institute</p>
</font>
      <p><font size="3"><br>
      <br>
      </font></td>
      <td width="15"></td>
      <td valign="top" width="80%"><font FACE="Times New Roman">
<font SIZE="2">
<p>&nbsp;</p>
</font>
<p>Protein domains are the structural, functional and evolutionary units of 
proteins. A protein can be regarded as a sequence of its domains. Given a new 
protein sequence, for instance from a genome project, the domains can be 
recognised on the basis of the similarity of sections of the amino acid sequence 
to known domain members. We have explored several methods for extending hidden 
Markov model (HMM) techniques for identifying protein domains.</p>
<p>The first method is based on language modelling from speech recognition. To 
increase word accuracy in speech recognition, language models are used to 
capture the information that certain word combinations are more likely than 
others, thus improving detection based on context. We demonstrate that a similar 
technique can significantly enhance protein domain recognition.</p>
<p>The second method is based on exploiting the fact that domains have uneven 
distribution across taxa. We use a taxonomic-specific association score to 
enhance protein domain recognition.</p>
<p>We have observed that closely related protein sequences which are each 
distantly homologous to a particular protein domain, are often either 
inconsistently identified as containing the protein domain, or not<span lang="en-gb">
</span>identified at all. Motivated by this observation, we have developed<span lang="en-gb">
</span>a phylogenetic method which scores a cluster of closely related proteins 
against a profile HMM. We show that this method can detect distant relationships 
not currently detectable by a standard profile HMM.</p>
      </font></td>
    </tr>
  </table>
  </center>
</div>

</body>


<!-- Mirrored from www.jenner.ac.uk/YBF/coin.htm by HTTrack Website Copier/3.x [XR&CO'2003], Fri, 25 Jun 2004 08:29:10 GMT -->
</html>