Stage | Script or executable | Output files or directories |
Initialization and configuration file; verification and loading | ShapeMapper.py, parseConfigFile.py, conf.py | RUN/log.txt: file that logs pipeline stage execution and error messages. RUN/temp: folder that stores subprocess standard out and standard error streams during execution (can be deleted after run completion). RUN/output: folder that will store the bulk of the pipeline output. RUN/output/*: subfolders that will store the output from each pipeline stage |
Quality trimming | trimPhred | RUN/output/trimmed_reads/*.fastq: sequencing reads trimmed left-to-right at the site of the first average phred score below conf.minPhred over a window of length conf.windowSize with resulting read lengths greater or equal to conf.minLength |
Sequence alignment preparation | bowtie2-build (third party)38 | RUN/output/bowtie_index/*: Bowtie2 reference sequence indices |
Sequence alignment | bowtie2 (third party)38 | RUN/output/aligned_reads/*.sam: aligned sequence files, one file for each line in the configuration file section ' [alignments]' |
Alignment parsing and ambiguously aligned deletion identification | parseAlignment | RUN/output/mutation_strings/*.txt: parsed and simplified alignments |
Mutation counting | countMutations, pivotCSV.py | RUN/output/counted_mutations/*.csv: mutation counts and read depths written to comma-separated files, one file for each line in configuration section '[alignments]'. RUN/output/counted_mutations_columns/*.csv: the same files arranged in column format
These files also contain the total mismatch count, total deletion count and total unambiguously aligned deletion count |
Reactivity profile creation and standard error calculation | generateReactivityProfiles.py (uses matplotlib–third party) | RUN/output/reactivity_profiles/*.tab: the most detailed output, containing mutation rates, depths, reactivities and standard errors in tab-delimited columns. RUN/output/reactivity_profiles/*.shape: simple SHAPE reactivity file, tab-delimited columns with nucleotide numbers in the first column and reactivities in the second, no-data positions indicated by −999. RUN/output/reactivity_profiles/*.map: SHAPE reactivity file including standard errors and nucleotide sequence. RUN/output/reactivity_profiles/*_histograms.pdf: histograms of mutation rates, read depths, and reactivities that are useful for troubleshooting. RUN/output/reactivity_profiles/*_depth_and_reactivity.pdf: read depth profile, mutation rate above background profile, and reactivity profile images |
Structure modeling | Fold (part of RNAstructure–third party)33 | RUN/output/folds/*.seq: reference sequence files in the format required by RNAstructure. RUN/output/folds/*.ct: structure models, one file for each line in configuration file section ' [folds]' |
Structure drawing | pvclient.py (custom client for the Pseudoviewer web service—third party)59 | RUN/output/folds/*.eps: postscript image files for the lowest predicted free energy structure colored by SHAPE reactivity, for each RNA specified in configuration section ' [folds]'. RUN/output/folds/*.xrna: XRNA files for each lowest predicted free energy structure, which can be manually edited if desired |
The initialization stage is directly executed by the user; all subsequent stages are launched automatically from the ShapeMapper.py script. 'RUN' indicates the path to the folder from which ShapeMapper was executed, which should contain FASTA reference sequences, raw sequencing reads and a configuration file. 'conf' indicates configuration file parameters. '*' is a wild-card character indicating multiple names.