Sign In

The Center of Excellence for Genomics (CEG)


Partner: Chinese Academy of Sciences

The field of genomic studies is considered one of the most advanced scientific fields where gene sequences are identified from DNA or RNA sequences. These sequences are chains of recurring four different sugar molecules (nucleotides) that can be thousands of millions in length arranged to form groups of coding sequences (genes) responsible for all functional expressions involved in biological processes which enable species to live, grow and reproduce. Analysis of complete genome sequence data of a given species helps to find its physical and genetic maps, which in turn determine the function of every gene found in its genome and lead to understanding all living, survival and disease resistance mechanisms.

Genome sequencing is a laboratory process to determine the complete DNA sequence of an organism.


Development of a high quality reference assembly is more taxing for plants with the highly repetitive genome (most >60%). To achieve a complete genome assembly of date palm, a high quality physical map is a necessary step, which is a useful tool in a wide range of utilities including assembly validation, comparative genomics and gene cloning. Most physical maps can provide median (100kb~1Mb) resolution to link the shotgun assembly and genetic map, and mainly bridge the scaffolds that are interrupted by the large repeats in plant genome. The newest optical mapping technology, which is used to provide the physical map, can also provide the large-scale structure variant information that is quite difficult identified with traditional methods.

  • Coconut organelle genome-mitochondrial
  • Coconut organelle genome-chloroplast

Coconut, a member of the palm family, is one of the most economically important crops in tropics as a source of food, drink, fuel, medicines and construction material. Based on next-generation sequencing data, we assembled the mitochondrial genome of coconut (Oman local tall cultivar) into a molecule of 678,653bp in length with 45.5% GC content. The mt genome of C. nucifera encodes 68 proteins (86 genes), 10 pseudo genes (11 genes), 23 tRNAs (17 amino acid codon and 1 stop codon, 42 tRNA genes) and 3 ribosomal RNAs (x2), which constitutes a gene content of 11.41% (77,465 bp) over the full length. In the whole genome, cp (chloroplast)-derived region accounts for 5.07% and includes 11 proteins (13 genes), 3 pseudo genes and 11 tRNAs. The mt genome has a relatively large repeat percent (17.26%), including forward repeats and inverted repeats. By mapping 8 RNA-seq data to the mt genome, 734 RNA editing sites were identified with at least two data set supports. Nucifera clustered with Phoenix dactylifera and Butomus umbellatus in monocot plants, while the 18S rRNA phylogenetic tree shown that B. We also studied the transcriptome profiles based on RNA-seq data and a lot of hypothetical proteins and pseudo genes shown significant differential expression. In summary, we provide the second complete mt genome sequence in family Arecaceae, which can be used for further investigations on the mitochondrial biology of seed plants.