Gene Discovery

ninja icon


•  Gene function can be studied using model organisms with similar sequences

•  Databases can be searched to compare new sequences with sequence of known function in other organisms

Comparative genomics is a field of bioinformatics in which the genome sequence of different species is compared

  • Gene sequences are highly conserved and so can be identified by matching with known gene sequences in other species
  • Species with close evolutionary relationships typically share a higher degree of gene sequence homology

When an equivalent gene is discovered in another organism, it enables scientists to use that organism to study gene function

  • There are greater ethical restraints involved in human experimentation, so using model organisms is a valid alternative
  • The most useful model organisms will be those with the highest degree of sequence similarity (e.g. bonobos = > 98%)
  • However, studying less related species may also be viable but present less ethical concerns (e.g. mouse = > 80%)

Evolutionary Origin of Human Proteins as an Indication of Gene Conservation

protein evolution

ninja icon


•  EST is an expressed sequence tag that can be used to identify potential genes

One method of identifying conserved genes in different species is via the use of an expressed sequence tag (EST)

  • An EST is a short sequence of DNA (~200 – 500 nucleotides) that is derived from the sequence of an expressed gene
  • An EST is generated by sequencing the terminal region of a cDNA construct (cDNA is copied from transcribed mRNA)

The EST is used to find a corresponding gene by finding its matching sequence within another organism’s genome

  • The efficacy of this process will be dependent on the genome size and occurrence of introns (cDNA does not copy introns)

Alignment of ESTs to Genome


ninja icon


•  Discovery of genes by EST data mining

EST data mining involves searching genome databases for sequences that are a match to an expressed sequence tag

  • As as EST represents the terminal portion of a gene, an exact sequence match may denote a gene’s presence and location 

EST sequences can be used to establish the function of novel genes:

  • If an EST sequence from a novel gene matches that of a known gene, it may indicate common functionality
  • If the EST returns no significant matches within a database, then no information can be gleaned regarding gene function

An example of an EST database which can be used for data mining gene locations and functions is GenBank

  • In order to specifically search the GenBank database for expressed sequence tags, change the search parameter to ‘EST’