Reconstruction module#

The circtools reconstruct module is based on FUCHS (FUll circular RNA CHaracterization from RNA-Seq), a Python program designed to fully characterize circular RNAs. It uses a list of circular RNAs and reads spanning the back-splice junction as well as BAM files containing the mappings of all reads (alternatively of all chimeric reads).

The reads from each circle are extracted by the reconstruction module and saved in an individual BAM files. Based on these BAM files, circtools will detect alternative splicing within the same circle boundaries, summarize different circular isoforms from the same host-gene, and generate coverage plots for each circRNA. It will also cluster circles based on their coverage profile. These results can be used to identify potential false positive circRNAs.

The reconstruction module depends on bedtools (>= 2.27.0), samtools (>= 1.3.1), Python (>= 3.7; pysam> pybedtools, numpy, and pathos), and R(>= 4.0.0; amap, Hmisc, gplots). All Python an R dependencies will be installed automatically when installing circtools. Please make sure to have the correct versions of bedtools and samtools in your $PATH.

General usage#

In order to characterize circRNAs from RNA-seq data the following steps are necessary:

Mapping of RNA-seq data from quality checked FASTQ files with STAR (BWA, TopHat-Fusion in preparation)
Detection circRNAs using circtools detect (CIRI, CIRCfinder or CIRCexplorer in preparation)
Run circtools reconstruct

Mapping of RNA-Seq data and detection of circRNAs#

Please see the documentation of circtools detect for instructions how to pre-process the data.

Usage of circtools reconstruct#

We continue by using the Jakobi et al. 2016 data set that also has been used as an example for the circtools detect module.

# download the wrapper scrips for the reconstruct module
wget https://raw.githubusercontent.com/dieterich-lab/bioinfo-scripts/master/slurm_circtools_reconstruct.sh
# add execute permission
chmod 755 slurm_circtools_reconstruct.sh

# create output directory
mkdir 03_reconstruct

# download exon annotations required by the reconstruct module
wget https://links.jakobilab.org/mm10.ensembl.exons.bed.bz2
bunzip mm10.ensembl.exons.bed.bz2

# circtools reconstruct is independently run on all samples:
parallel -j1  slurm_circtools_reconstruct.sh {} 03_reconstruct/ mm10.ensembl.exons.bed 01_detect/ ::: ALL_1654_M ALL_1654_N ALL_1654_O ALL_1654_P ALL_1654_Q ALL_1654_R ALL_1654_S ALL_1654_T

Manually running the reconstruction module#

The above wrapper scripts handles all preprocessing and conversion steps. However, advanced users may want to start the module directly. circtools reconstruct starts the pipeline which will extract reads, check mate status, detect alternative splicing events, classify different isoforms, coverage profiles, and cluster circRNAs based on coverage profiles. Below a sample call for the reconstruct module in single sample mode using circtools detect input data:

# using STAR/circtools detect Input
$ circtools reconstruct -r 2 -q 2 -p ensembl -e 2 -T ~/tmp
        -D CircRNACount
        -J sample/Chimeric.out.junction
        -F sample.1/Chimeric.out.junction
        -R sample.2/Chimeric.out.junction.fixed
        -B merged_sample.sorted.bam
        -A [annotation].bed
        -N sample

# if BWA/CIRI was used, use -C to specify the circIDS list (omit -D, -J, -F and -R)
# For details on the parameters please refer to the help page:
$ circtools reconstruct --help

Optional reconstruct module#

The additional module denovo_circle_structure_parallel can be employed to obtain a more refined circle reconstruction based on intron signals. The circRNA-separated bamfiles (step 2) are the only input required for the module. If an annotation file is supplied, unsupported exons will be reported with a score of 0, if no annotation file is supplied, unsupported exons will not be reported.

$ denovo_circle_structure_parallel -c 18 -A [annotatation].bed -I output/folder -N sample

# output/folder corresponds to the output directory of the circtools reconstruct pipeline
# sample corresponds to your sample name, just as specified for the pipeline

Required input data#

circRNA IDs#

CircRNA data data can be provided via a generic table with the structure found below:

circID	read1,read2,read3
1:3740233\|3746181	MISEQ:136:000000000-ACBC6:1:2107:10994:20458,MISEQ:136:000000000-ACBC6:1:1116:13529:8356
1:8495063\|8614686	MISEQ:136:000000000-ACBC6:1:2118:9328:9926

The first column contains the circleRNA ID formated as folllowed chr:start|end. The second column is a comma separated list of read names spanning the back-splice junction.

BAM input files#

Alignment files produced by any suitable read mapping tool. The files have to contain all chimerically mapped reads and may also contain linearly mapped reads.

BED annotation file#

A BED file in BED6 format. The name should contain a gene name or gene ID and the exon_number. You can specify how the name should be processed using -p (platform), -s (character used to separate name and exon number) and -e (exon_index).

Chr	Start	End	Name	Strand
1	67092175	67093604	NR_075077_exon_0_0_chr1_67092176_r	-
1	67096251	67096321	NR_075077_exon_1_0_chr1_67096252_r	-
1	67103237	67103382	NR_075077_exon_2_0_chr1_67103238_r	-

Output produced by circtools reconstruct#

*.alternative_splicing.txt#

This file summarizes the relationship of different circRNAs derived from the same host-gene. A sample file structure given below:

Transcript	circles	same_start	same_end	overlapping	within
NM_016287	1:20749723-20773610	.	.	.	.
NM_005095	1:35358925-35361789,1:35381259-35389082,1:35381259-35390098	1:35381259-35389082\|1:35381259-35390098,	.	.	.
NM_001291940	1:236803428-236838599,1:236806144-236816543	.	.	.	1:236803428-236838599\|1:236806144-236816543,