Conservation module [New in 2.0]

Conservation module [New in 2.0]#

Evolutionary conservation analysis oftentimes uncovers the potential functional relevance of circRNAs by comparing their sequence and genomic position across different organisms. We developed the conservation module to enable users to perform circRNA conservation analysis in five widely studied animal model species: mouse, human, rat, pig, and dog. The framework of the conservation module was developed with the flexibility to incorporate more species in the analysis by simply adding the species to the input config file.

General usage#

A call to circtools conservation --help shows all available command line flags:

usage: circtools [-h] -d DCC_FILE -g GTF_FILE -f FASTA_FILE [-O {mm,rn,hs,ss,cl}]
                 [-TS TARGET_SPECIES] [-s SEQUENCE_FILE] [-o OUTPUT_DIR] [-T EXPERIMENT_TITLE]
                 [-t GLOBAL_TEMP_DIR] [-G GENE_LIST [GENE_LIST ...]]
                 [-GL GENE_LIST_FILE [GENE_LIST_FILE ...]]
                 [-i ID_LIST [ID_LIST ...]] [-hg19] [-mm10] [-pairwise_flag]

circular RNA conservation analysis

optional arguments:
  -h, --help            show this help message and exit

Input:
  -d DCC_FILE, --dcc-file DCC_FILE
                        CircCoordinates file from DCC / detect module
  -g GTF_FILE, --gtf-file GTF_FILE
                        GTF file of genome annotation e.g. ENSEMBL
  -f FASTA_FILE, --fasta FASTA_FILE
                        FASTA file with genome sequence (must match
                        annotation)
  -O {mm,rn,hs,ss,cl}, --organism {mm,rn,hs,ss,cl}
                        Organism of the study, mm =
                        Mus musculus, hs = Homo sapiens, rn = Rattus norvegicus,
                        ss = Sus scrofa, cl = Canis lupus familiaris
  -TS TARGET_SPECIES,       --target_species
                        List of target species IDs for which conservation score
                        needs to be calculated
  -s SEQUENCE_FILE, --sequence SEQUENCE_FILE
                        FASTA file containing the circRNA sequence (exons and
                        introns)

Output options:
  -o OUTPUT_DIR, --output OUTPUT_DIR
                        Output directory (must exist)
  -T EXPERIMENT_TITLE, --title EXPERIMENT_TITLE
                        Title of the experiment for HTML output and file name

Additional options:
  -t GLOBAL_TEMP_DIR, --temp GLOBAL_TEMP_DIR
                        Temporary directory (must exist)
  -G GENE_LIST [GENE_LIST ...], --genes GENE_LIST [GENE_LIST ...]
                        Space-separated list of host gene names. Primers for
                        CircRNAs of those genes will be designed.E.g. -G
                        "CAMSAP1" "RYR2"
  -i ID_LIST [ID_LIST ...], --id-list ID_LIST [ID_LIST ...]
                        Space-separated list of circRNA IDs. E.g. -i
                        "CAMSAP1_9_135850137_135850461_-"
                        "CAMSAP1_9_135881633_135883078_-"
  -hg19, --hg19
                        Are given circular co-ordinates for human from hg19 assembly?
                        If the flag is on, these will be converted into hg38.
  -mm10, --mm10
                        Are given circular co-ordinates for mouse from mm10 assembly?
                        If the flag is on, these will be converted into mm39.
  -pairwise_flag, --pairwise_flag
                        Should pairwise alignments be performed as well?
                        Additional barplot will be plotted in this case.

Sample call to conservation module#

A sample call to conservation using the Jakobi et al. 2016 data requires the GTF file for exon information and the Fasta sequence of the reference genome in order to obtain the exon sequences.

# run circtools conservation for circular RNA Slc8a1 to check its conservation in human and dog
circtools conservation -d CircCoordinate -f Mus_musculus.GRCm38.dna.primary_assembly.fa -g Mus_musculus.GRCm38.90.gtf -O mm -G Slc8a1 -o test/ -t temp/ -TS hs -pairwise
Start parsing GTF file
Start merging GTF file outside the function
Slc8a1_17_81647809_81649638_-
extracting flanking exons for circRNA # 0 Slc8a1_17_81647809_81649638_-
WARNING! 54986 REST API requests remaining!
Processing target species:  hs
*** Lifting over BSJ exon ***
Successfully ran liftOver command human
WARNING! 54985 REST API requests remaining!
No nearby exon found. Trying for neaby exon search using orthology information.
WARNING! 54984 REST API requests remaining!
WARNING! 54983 REST API requests remaining!
Lifted circle in target species  hs  is  ['2', '40097269', '40115629']
mm(17:81647809- 0.000000
hs(2:40097269-4 0.936299    0.000000
    mm(17:81647809- hs(2:40097269-4
Cleaning up

circtools conservation takes a few seconds to process the input data. It fetches the information like gene orthologs, liftOver co-ordinates, exon sequences from REST API. The lifted over co-ordinates in target species are written in BED file. A phylogenetic tree for sequence alignement is drawn and saved in an SVG file.

If user wants to perform circle conservation analysis for species other than mentioned in the -O option, it can be easily done by editing the config file. An example config file is provided in the folder config/. Following entries per species are required in order to include a new species:

mm:
        input: # two letter abbreviation of the species (mm)
        id: # genome versions for liftOver chain files (mm39)
        name: # species alias according to Ensembl Rest API format (mouse)
        ortho_id: # species name according to Ensembl Rest API format (mus_musculus)