Home-->Genomic Sequencing-->Gene Pridiction

Gene Predictions

 

The unigene set (contigs and singletons) from super-assembly was searched and masked for their repeat elements using Repeat Masker. For repeat masking, a custom library was made from Repbase as well as from all plant repeats.Contigs from the masked unigene set were searched for the presence of gene models using three eukaryotic de novo gene prediction tools viz. Augustus, GenScan and GlimmerHMM trained on Arabidopsis thaliana. Gene models predicted by all the tools were compared with each other using reciprocal blast similarity search on stringent parameters and a common gene pool was made of the gene models supported by minimum of two prediction tools.

 
         
  Repetative Elements : Repetative element analysis after super assembly  
 
 Total sequences 4095129  
Total length 1272624566 bp
GC Level 37.76%
Bases Masked 181188734 bp (14.24%)  
Class of repeats number of elements length occupied (bp) % of sequence  % of all repeats
Retrotransposons 149021 150093339 11.79 96.99
LTR 119666 145777784 11.45 94.20
ERVL 437 69203 0.01 0.04
ERVL-MaLRs 739 122906 0.01 0.08
ERV_classI 1851 168259 0.01 0.11
ERV_classII 2391 115750 0.01 0.07
Gypsy 487348 113658769 8.93 73.44
Copia 146650 33341979 2.62 21.54
LINEs: 26223 3909068 0.31 2.53
LINE1 18999 3307994 0.26 2.14
LINE2 1569 137023 0.01 0.09
L3/CR1 727 41655 0.00 0.03
SINEs: 3132 406487 0.03 0.26
ALUs 1924 322322 0.03 0.21
MIRs 378 49643 0.00 0.03
DNA elements: 36095 4433552 0.35 2.86
hAT-Charlie 559 55851 0.00 0.04
TcMar-Tigger 242 35509 0.00 0.02
Unknown elements 2002 228898 0.02 0.15
Total Repetative elements 187118 154755789 12.16 ---
Small RNA 4689 713100 0.06 0.46
Satellite 2628 271700 0.02 0.18
Simple repeats 57028 3148863 0.25 2.03
Low complexity 456896 22447631 1.76 ---
 
         
         
  Gene Prediction : Gene Prediction of super assembled contigs using different tools  
 
Gene prediction 
Gene prediction tool AUGUSTUS GENSCAN GlimmerHMM 
Total gene models (>100 bp) 90294 125422 97533
Unique to prediction tool 2923 16494 3692
Common Genes (b/w any of two softwares) 93363