# Mapping reads to the transcriptome with TopHat¶

Now that we have some quality-controlled reads, we’re going to map the reads to the reference gene set, for the purpose of counting how many reads have come from each gene. We’ll be using the TopHat software

For this purpose, we’ve already installed the human reference gene set on the HPC (as part of the data you loaded at the beginning). In this case we’ve loaded in the Illumina iGenomes project into the RNAseq-model data set.

module load TopHat2/2.0.12


And now run TopHat:

cd ~/rnaseq
tophat -p 4 \
-G ~/RNAseq-model/Homo_sapiens/Ensembl/GRCh37/Annotation/Genes/genes.gtf \
--transcriptome-index=\$HOME/RNAseq-model/transcriptome \
-o tophat_salivary_repl1 \
~/RNAseq-model/Homo_sapiens/Ensembl/GRCh37/Sequence/Bowtie2Index/genome \
salivary_repl1_R1.qc.fq.gz salivary_repl1_R2.qc.fq.gz


This will take about 15 minutes.

Questions:

• What are all these parameters?!
• How do we pick the transcriptome/genome?
• Why is it so slow?

samtools view -c -F 4 tophat_salivary_repl1/accepted_hits.bam