Gene Identification

ninja icon


•  Bioinformatics plays a role in identifying target genes

•  An open reading frame is a significant length of DNA from a start codon to a stop codon

Target genes can be identified by searching online databases for long stretches of DNA that could potentially code for protein

  • These sequences – called open reading frames (ORF) – will be preceded by a start codon and uninterrupted by stop codons
  • Open reading frames will typically consist of at least 100 codons (300 nucleotides)
  • Searches can be refined by looking at regions downstream of known promoter sequences and upstream of termination sites

While open reading frames may predict potential coding regions, they do not automatically guarantee the presence of a gene

  • Some long and uninterrupted sequences DNA may not actually be translated, whilst other short sequences may code protein

ninja icon


•  Identification of an open reading frame

Any particular stretch of DNA will have six reading frames that could potentially code for a functional protein

  • mRNA is translated in codons (triplets of bases), meaning there are three potential reading frames for a given DNA sequence
  • DNA is double stranded and either strand could include a gene, meaning there are six reading frames in total (2 × 3)

To identify an open reading frame:

  • Locate a sequence corresponding to a start codon in order to determine the reading frame – this will be ATG (sense strand)
  • Read this sequence in base triplets until a stop codon is reached (TGA, TAG or TAA)
  • The longer the sequence, the more significant the likelihood that the sequence corresponds to an open reading frame

Certain bioinformatic programs can automatically identify potential ORFs when provided with a candidate sequence

  • Gene sequences are largely conserved – so if an ORF sequence is present in multiple genomes, it likely represents a gene

Identification of an Open Reading Frame

open reading frame

Link:  ORF Finder

ninja icon


•  The target gene is linked to other sequences that control its expression

The expression of a gene is controlled by other additional sequences that regulate transcriptional activity

  • A core promoter sequence functions as an initiation site where a complex of transcription factors are assembled
  • Control elements may serve to regulate the rate of transcription – either increasing (enhancers) or decreasing (silencers)

When a target gene is selected, it is linked to these other sequences to form a recombinant construct capable of expression

  • Including a promoter sequence as part of the construct will ensure the autonomous expression of the target gene
  • Control elements may be included within the construct to allow scientists to determine the rate and timing of expression

Controlling Expression of a Target Gene

target gene regulation