Conservation module [New in 2.0]#
Evolutionary conservation analysis oftentimes uncovers the potential functional relevance of circRNAs by comparing their sequence and genomic position across different organisms. We developed the conservation module to enable users to perform circRNA conservation analysis in five widely studied animal model species: mouse, human, rat, pig, and dog. The framework of the conservation module was developed with the flexibility to incorporate more species in the analysis by simply adding the species to the input config file.
General usage#
A call to circtools conservation --help
shows all available command line flags:
usage: circtools [-h] -d DCC_FILE -g GTF_FILE -f FASTA_FILE [-O {mm,rn,hs,ss,cl}]
[-TS TARGET_SPECIES] [-s SEQUENCE_FILE] [-o OUTPUT_DIR] [-T EXPERIMENT_TITLE]
[-t GLOBAL_TEMP_DIR] [-G GENE_LIST [GENE_LIST ...]]
[-GL GENE_LIST_FILE [GENE_LIST_FILE ...]]
[-i ID_LIST [ID_LIST ...]] [-hg19] [-mm10] [-pairwise_flag]
circular RNA conservation analysis
optional arguments:
-h, --help show this help message and exit
Input:
-d DCC_FILE, --dcc-file DCC_FILE
CircCoordinates file from DCC / detect module
-g GTF_FILE, --gtf-file GTF_FILE
GTF file of genome annotation e.g. ENSEMBL
-f FASTA_FILE, --fasta FASTA_FILE
FASTA file with genome sequence (must match
annotation)
-O {mm,rn,hs,ss,cl}, --organism {mm,rn,hs,ss,cl}
Organism of the study, mm =
Mus musculus, hs = Homo sapiens, rn = Rattus norvegicus,
ss = Sus scrofa, cl = Canis lupus familiaris
-TS TARGET_SPECIES, --target_species
List of target species IDs for which conservation score
needs to be calculated
-s SEQUENCE_FILE, --sequence SEQUENCE_FILE
FASTA file containing the circRNA sequence (exons and
introns)
Output options:
-o OUTPUT_DIR, --output OUTPUT_DIR
Output directory (must exist)
-T EXPERIMENT_TITLE, --title EXPERIMENT_TITLE
Title of the experiment for HTML output and file name
Additional options:
-t GLOBAL_TEMP_DIR, --temp GLOBAL_TEMP_DIR
Temporary directory (must exist)
-G GENE_LIST [GENE_LIST ...], --genes GENE_LIST [GENE_LIST ...]
Space-separated list of host gene names. Primers for
CircRNAs of those genes will be designed.E.g. -G
"CAMSAP1" "RYR2"
-i ID_LIST [ID_LIST ...], --id-list ID_LIST [ID_LIST ...]
Space-separated list of circRNA IDs. E.g. -i
"CAMSAP1_9_135850137_135850461_-"
"CAMSAP1_9_135881633_135883078_-"
-hg19, --hg19
Are given circular co-ordinates for human from hg19 assembly?
If the flag is on, these will be converted into hg38.
-mm10, --mm10
Are given circular co-ordinates for mouse from mm10 assembly?
If the flag is on, these will be converted into mm39.
-pairwise_flag, --pairwise_flag
Should pairwise alignments be performed as well?
Additional barplot will be plotted in this case.
Sample call to conservation module#
A sample call to conservation using the Jakobi et al. 2016 data requires the GTF file for exon information and the Fasta sequence of the reference genome in order to obtain the exon sequences.
# run circtools conservation for circular RNA Slc8a1 to check its conservation in human and dog
circtools conservation -d CircCoordinate -f Mus_musculus.GRCm38.dna.primary_assembly.fa -g Mus_musculus.GRCm38.90.gtf -O mm -G Slc8a1 -o test/ -t temp/ -TS hs -pairwise
Start parsing GTF file
Start merging GTF file outside the function
Slc8a1_17_81647809_81649638_-
extracting flanking exons for circRNA # 0 Slc8a1_17_81647809_81649638_-
WARNING! 54986 REST API requests remaining!
Processing target species: hs
*** Lifting over BSJ exon ***
Successfully ran liftOver command human
WARNING! 54985 REST API requests remaining!
No nearby exon found. Trying for neaby exon search using orthology information.
WARNING! 54984 REST API requests remaining!
WARNING! 54983 REST API requests remaining!
Lifted circle in target species hs is ['2', '40097269', '40115629']
mm(17:81647809- 0.000000
hs(2:40097269-4 0.936299 0.000000
mm(17:81647809- hs(2:40097269-4
Cleaning up
circtools conservation
takes a few seconds to process the input data. It fetches the information like gene orthologs, liftOver co-ordinates, exon sequences from REST API. The lifted over co-ordinates in target species are written in BED file. A phylogenetic tree for sequence alignement is drawn and saved in an SVG file.
If user wants to perform circle conservation analysis for species other than mentioned in the -O
option, it can be easily done by editing the config file. An example config file is provided in the folder config/
. Following entries per species are required in order to include a new species:
mm:
input: # two letter abbreviation of the species (mm)
id: # genome versions for liftOver chain files (mm39)
name: # species alias according to Ensembl Rest API format (mouse)
ortho_id: # species name according to Ensembl Rest API format (mus_musculus)