CircTest module ******************************************************** The CircTest module of circtools allows to test the variation of circRNAs in respect to host genes. It is recommended to work with the output of the ``circtools detect`` module, but can also run on custom count tables. Required are one table with circular RNA counts and one table containing with host-gene counts. These tables have to have the same order, i.e. ``circ[i,j]`` and ``linear[i,j]`` are read-counts for the same circRNA in the same sample. The ``circtools circtest`` module is based on the equally named R package `CircTest `_ Required tools and packages ---------------------------- ``circtools circtest`` depends on R and the following R packages: * aod * ggplot2 * plyr The ``CircTest`` R package as well as all dependencies are installed during the circtools installation procedure. Usage with ``circtools detect`` data ------------------------------------- A call to ``circtools circtest --help`` shows all available command line flags: .. code-block:: bash usage: circtools [-h] -d DETECT_DIR -l CONDITION_LIST -c CONDITION_COLUMNS -g GROUPING [-r NUM_REPLICATES] [-f MAX_FDR] [-p PERCENTAGE] [-s FILTER_SAMPLE] [-C FILTER_COUNT] [-o OUTPUT_DIRECTORY] [-n OUTPUT_NAME] [-m MAX_PLOTS] [-a LABEL] [-L RANGE] [-O ONLY_NEGATIVE] [-H ADD_HEADER] [-M {colour,bw}] circular RNA statistical testing - Interface to https://github.com/dieterich- lab/CircTest optional arguments: -h, --help show this help message and exit Required: -d DETECT_DIR, --detect DETECT_DIR Path to the circtools detect data directory -l CONDITION_LIST, --condition-list CONDITION_LIST Comma-separated list of conditions which should be comparedE.g. "RNaseR +","RNaseR -" -c CONDITION_COLUMNS, --condition-columns CONDITION_COLUMNS Comma-separated list of 1-based column numbers in the circtools detect output which should be compared; e.g. 10,11,12,13,14,15 -g GROUPING, --grouping GROUPING Comma-separated list describing the relation of the columns specified via -c to the sample names specified via -l; e.g. -g 1,2 and -r 3 would assign sample1 to each even column and sample 2 to each odd column Processing options: -r NUM_REPLICATES, --replicates NUM_REPLICATES Number of replicates used for the circRNA experiment [Default: 3] -f MAX_FDR, --max-fdr MAX_FDR Cut-off value for the FDR [Default: 0.05] -p PERCENTAGE, --percentage PERCENTAGE The minimum percentage of circRNAs account for the total transcripts in at least one group. [Default: 0.01] -s FILTER_SAMPLE, --filter-sample FILTER_SAMPLE Number of samples that need to contain the amount of reads specified via -C [Default: 3] -C FILTER_COUNT, --filter-count FILTER_COUNT Number of CircRNA reads that each sample specified via -s has to contain [Default: 5] Output options: -o OUTPUT_DIRECTORY, --output-directory OUTPUT_DIRECTORY The output directory for files created by circtools [Default: .] -n OUTPUT_NAME, --output-name OUTPUT_NAME The output name for files created by circtools [Default: circtest] -m MAX_PLOTS, --max-plots MAX_PLOTS How many of candidates should be plotted as bar chart? [Default: 50] -a LABEL, --label LABEL How should the samples be labeled? [Default: Sample] -L RANGE, --limit RANGE How should the samples be labeled? [Default: Sample] -O ONLY_NEGATIVE, --only-negative-direction ONLY_NEGATIVE Only print entries with negative direction indicator [Default: False] -H ADD_HEADER, --add-header ADD_HEADER Add header to CSV output [Default: False] -M {colour,bw}, --colour {colour,bw} Can be set to bw to create grayscale graphs for manuscripts Sample call @@@@@@@@@@@ As for the other module tutorials, we use the `Jakobi et al. 2016 `_ data set from the detection module in this module. Below is the sample call for the newly generated circtools detect data: .. code-block:: bash circtools circtest -d 01_detect/ -p 0.01 -s 3 -r 4 -C 2 -g 1,2,1,2,1,2,1,2 -l RNaseR-,RNaseR+ -c 4,5,6,7,8,9,10,11 -o 04_circtest/ Here we have the DCC data located in the folder ``01_detect/``, the experiment had 2 conditions, listed via ``-l RNaseR-,RNaseR+``, the samples in the circtools detect data file are sorted in the the order specified via ``-g 1,2,1,2,1,2,1,2``, i.e. there are 4 ``RNaseR-`` samples and 4 ``RNaseR+`` samples. These ``4+4=8`` columns are found in the circtools detect data file in the columns specified via ``-c 4,5,6,7,8,9,10,11``. Output files @@@@@@@@@@@@@ The ``circtest`` module creates an .xlsx file that contains all circRNA candidates passing the statistical test with the given values, as well as the raw data files. Additionally a .pdf file is generated that contains a graphical representation of the top significant circRNAs (see sample picture). .. image:: img/circtest_sample_plot.png Usage with external count data ------------------------------------- Additional to the built-in functionality to use directly use the data files produced by ``circtools detect`` it is also possible to use generic count tables. In this case however, the underlying R package ``CircTest`` has to be used directly. The input tables may have many columns describing the circle or just one column containing the circle ID followed by many columns of read counts. Example count table for back-spliced reads ``(Circular.csv)`` @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ================== =============== ============== ============== ================ ================ ================ **CircID** **Control_1** **Control_2** **Control_3** **Treatment_1** **Treatment_2** **Treatment_3** ================== =============== ============== ============== ================ ================ ================ chr1:100|800 0 2 1 5 4 0 chr1:1050|10080 20 22 21 10 13 0 chr2: 600|1000 0 1 0 10 0 1 chr10:4100|5400 55 54 52 56 53 50 chr11:600|1500 3 0 1 2 2 3 ================== =============== ============== ============== ================ ================ ================ Example table for host-gene reads ``(Linear.csv)`` @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ ================== =============== ============== ============== ================ ================ ================ **CircID** **Control_1** **Control_2** **Control_3** **Treatment_1** **Treatment_2** **Treatment_3** ================== =============== ============== ============== ================ ================ ================ chr1:100|800 10 11 12 9 10 10 chr1:1050|10080 80 281 83 45 48 46 chr2: 600|1000 5 5 2 12 8 7 chr10:4100|5400 101 110 106 150 160 153 chr11:600|1500 20 21 18 19 20 20 ================== =============== ============== ============== ================ ================ ================ Sample R calls to work with generic data @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1. Read in tables .. code-block:: R Circ <- read.delim('Circ.csv', header = T, as.is = T) Linear <- read.delim('Linear.csv', header = T, as.is = T) 2. Filter tables To model expression data using the beta binomial distribution and testing for differences in groups, it is beneficial to only test well supported circles. Users may use the package's function ``Circ.filter()`` to filter the input data. The function has the following parameters: * ``Nreplicates``: specifies the number of replicates in each condition * ``filter.sample``: specifies the number of samples the circle has to have enough circular reads in to be considered. * ``filter.count``: specifies the circular read count threshold. * ``percentage``: specifies the minimum circle to host-gene ratio. * ``circle_description``: tells the function which columns are NOT filled with read counts but the circle's annotation. .. code-block:: R # filter circles by read counts Circ_filtered <- Circ.filter(circ = Circ, linear = Linear, Nreplicates = 3, filter.sample = 3, filter.count = 5, percentage = 0.1, circle_description = 1) # CircID Control_1 Control_2 Control_3 Treatment_1 Treatment_2 Treatment_3 # 2 chr1:1050|10080 20 22 21 10 13 0 # 4 chr10:4100|5400 55 54 52 56 53 50 # filter linear table by remaining circles Linear_filtered <- Linear[rownames(Circ_filtered),] # CircID Control_1 Control_2 Control_3 Treatment_1 Treatment_2 Treatment_3 # 2 chr1:1050|10080 80 81 83 45 48 46 # 4 chr10:4100|5400 101 110 106 150 160 153 3. Test for changes **Circ.test** uses the beta binomial distribution to model the data and performs an ANOVA to identify circles which differ in their relative expression between the groups. It is important that the grouping is correct (**group**) and the non-read-count columuns are specified (**circle_description**). .. code-block:: R test <- Circ.test(Circ_filtered, Linear_filtered, group=c(rep(1,3),rep(2,3)), circle_description = 1) $summary_table CircID sig_p 4 chr10:4100|5400 0.01747407 # $sig.dat # CircID Control_1 Control_2 Control_3 Treatment_1 Treatment_2 Treatment_3 # 4 chr10:4100|5400 55 54 52 56 53 50 $p.val [1] 0.153464107 0.008737037 $p.adj [1] 0.15346411 0.01747407 $sig_p [1] 0.01747407 4. Visualize data The CircTest library features a built-in plotting functions to view significantly different genes. Sample code for visualizing the ratio as barplot might be something like: .. code-block:: R for (i in rownames(test$summary_table)) { Circ.ratioplot(Circ_filtered, Linear_filtered, plotrow=i, groupindicator1=c(rep('Control',3),rep('Treatment',3)), lab_legend='Condition', circle_description = 1 ) } In order to visualize the abundance of host-gene and circle separately in a line plot try .. code-block:: R for (i in rownames(test$summary_table)) { Circ.lineplot(Circ_filtered, Linear_filtered, plotrow=i, groupindicator1=c(rep('Control',3),rep('Treatment',3)), circle_description = 1 ) }