RNASimulASE - Simulator of Allele Specific RNA-seq data

Copyright © 2013 Daniel Edsgärd, Olof Emanuelsson
RNASimulASE is available free to use, under the GNU GPL version 3 license.
This product includes the software hapgen2, developed by Zhan Su et al, which is freely available for academic use only.
  1. About
  2. Download
  3. Installation
    1. Prerequisites
    2. Installing a binary distribution
    3. Building from source
  4. Running RNASimulASE
    1. Quick start
    2. Introduction and output directories
    3. Options
    4. Annotation
    5. Parallelization
  5. Citing RNASimulASE
  6. Contact information

About

RNASimulASE is a tool to simulate allele-specific expression (ASE) data generated by RNA-sequencing. Recently, there has been an increased interest in leveraging the diploid nature of genomes in genetic research (Tewhey, et al., NatRevGen, 2011). In RNA-seq this is realized by so called allele-specific expression (ASE), where the transcriptional level for each allele from a pair of homologous chromosomes can be distinguished at heterozygote variants.

To evaluate experimental designs, as well as the robustness and validity of emerging ASE analysis approaches, simulation of allele-specific RNA-seq data is crucial. Given a reference transcriptome, genetic variants, recombination rates, and empirical transcript expression levels, RNASimulASE generates diploid personal transcriptomes in FASTA format and RNA-seq output in FASTQ format. Additional features include base quality sampling and sequencing error simulation from empirical data of an actual run of a sequencing-machine. All input parameters have default values, facilitating an easy-to-use program. See the paper for further information.

Download

RNASimulASE is available via Sourceforge: Download SimulASE.

Installation

Prerequisites

Apart from the provided C++ binaries, RNASimulASE also makes use of Ruby and R scripts. Working installations of these languages are therefore needed. Ruby is shipped as part of most OS distributions. R can be downloaded from here.

Installing a binary distribution

Binaries are provided for Linux (x86_64) and OS X (Intel).
  1. Download the binaries of RNASimulASE: rnasimulase-X.X.platform.tar.gz.
    If simulating human data, annotation is also available for download: annot.tar.gz
  2. Extract the binaries and annotation:
    "tar -xvzf rnasimulase-X.X.tar.gz"
    "tar -xvzf annot.tar.gz"
  3. Add the binary directory to your shell PATH, by adding the following to your ~/.bashrc (Linux) or ~/.profile (OS X) shell startup file. Similarly, set an environment variable "ANNOT" to the annotation directory, within the same shell startup file:
    "export PATH=/path/to/rnasimulase/bin:$PATH"
    "export ANNOT=/path/to/rnasimulase/annot"
    Note: Users of shells other than bash need to amend the PATH setting command accordingly in their corresponding shell startup file.

Building from source

The source code is distributed along with the binaries and the only additional steps as compared to installing the binary distribution (see 1-3 above) is to:
  1. Go to the '/path/to/rnasimulase/src' directory and type:
    "make"
    Note: The makefile requires a gcc compiler version which includes the ISO 2011 C++ standard library (binaries for rnasimulase-1.0 was compiled with gcc 4.7).
  2. To compile hapgen2 on a different platform the developer asks you to contact them. Once compiled, put the hapgen binary into the rnasimulase binary directory ('/path/to/rnasimulase/bin').

Running RNASimulASE

Quick start

To run with default parameters, type: simulase
For help, type: simulase -h.

Introduction and output directories

Executing the RNASimulASE pipeline by typing rnasimulase will execute the suite of subprograms in the following order: simhaplotranscriptome, simexpr, simfastqtrain and simfastq.

By default, scripts generated and executed by rnasimulase are put in a directory called "cmds" and data is written to a directory called "data". Within "data", three sub-directories are created:

Options

Options are described in detail if applying the -h option to rnasimulase or any of its subprograms. Furthermore, examples of the format for the input files are provided in the annotation (annot.tar.gz). To get detailed help for any of the options, type:
Note: To get further help on hapgen2 options, visit their site.

Parallelization

If the user wishes to simulate a very high number of individuals, rnasimulase can be executed as separate tasks on separate processors. To facilitate this, the option "-r" is provided, which generates a random directory to which the output is written.

Annotation

We provide annotation files for analysis of human data. This include files needed to run Simdiplotranscriptome (hapgen and writing diploid transcriptomes), Simexpr and Simfastqtrain:

Citing RNASimulASE

Edsgärd D. and Emanuelsson O., RNASimulASE - Simulation of Allele Specific RNA-seq Data, 2013 (Submitted)

Contact information

Please use the Discussion Forum for issues related to running the program. Other comments or questions may be e-mailed to:
Technical: Daniel Edsgärd
PI: Olof Emanuelsson