slamdunk
slamdunk copied to clipboard
4SU labeled mRNA library built using the total RNAseq
Dear Tobias:
I have part of the data 4SU labeled mRNA library built using the total RNAseq. How do I use slamDunk to analyze them. Same issue #89
By the way, what is the mismatch of the slamdunk map? Is it related to the length of the reads? Does the longer the reads, the more mismatches there are? Which parameter to set?
best
What do you mean with parts of the library? In essence, I would forward you to @popitsch who is having some solutions for the Ameres lab in place for such cases.
What do you mean with mismatch of slamdunk map? There is no parameter to set here?
Dear Tobias:
Sorry my expression is not clear enough, I have S4U quant-seq paired-end data and S4U total RNA-seq paired-end data.
S4U total RNA-seq paired-end data, because it is paired-end sequencing, may need to use "Hisat-3N" to process the map, and then use ”slamdunk filter snp count" to process. "3utr.bed" needs to be replaced with "transcript.bed".
S4U quant-seq paired-end data, according to your previous suggestion, select R1data to run "slamdunk all" use "3utr.bed"
Are the above steps okay? #104 The above method seems not easy to perform.
Does the slamdunk map have parameters: the number of allowed mismatched bases between reads and the reference genome. If there is this parameter, I think the longer the reads, the higher this parameter will be. The length of reads is affected by different sequencers, some sequencers are 50bp, some are 150bp.
best
Hm I would propably use the same pipeline for both sets to keep them comparable - for this Niko should be able to help, to map + quantify the reads outside of slamdunk.
In principal you shouldn't need to tweak the slamdunk map parameters as they take the % mismatches into account which of course is independent of the read length
Hi, I am currently developing rnalib, a python library (https://github.com/popitsch/rnalib) that could be used for the analysis of 4sU datasets. On Github you will also find a (simplified!) SLAM-seq full transcriptome timecourse tutorial that describes the analysis steps from mapped BAM files to actual half-life predictions.
For mapping your SLAM-seq reads to the genome, you could use, for example, STAR (make sure that you are outputting a MD tag using —outSAMattributes parameter and use a GTF file via the --sjdbGTFfile param for informing STAR on known splice-sites, e.g., with a command line like the one shown below) or HISAT-3N. In our splice_sim project (https://github.com/popitsch/splice_sim, manuscript in review) we looked also at the influence of T/C conversions on spliced read mappability and found both mappers to work well in scenarios with typical SLAM-seq conversion rates (what 4sU concentrations did you use? What conversion rates do you expect?).
@popitsch
Dear Niko: Thank you for your reply. “e.g., with a command line like the one shown below” The example command here is not shown. Can you give me a pair-ends example?
Another question is whether STAR needs to be optimized for T>C. Is it possible that there are a lot of T>C in the read that cause the read to not map to the genome reference? I see that hisat can set --base-change T,C to avoid the penalty of T>C mismatch.
Hi, Sorry, forgot to add the command-line example, but its nothing fancy, just something like
STAR --genomeDir ${star_index} \
--sjdbGTFfile ${star_gtf}
--outSAMattributes NH HI AS nM MD \
--readFilesCommand zcat \
--readFilesIn ${mate1_fqz} ${mate2_fqz} \
<your_other_parameters>
The MD tag is needed by rnalib to quickly calculate T/C mismatches per read. Passing known splice-junctions to STAR (either when building the index or via sjdbGTFfile) really makes a difference wrt. spliced read mappability and is highly recommended.
I will add a paired-end section to the rnalib SLAM-seq tutorial soon, probably next week.
You can, of course, also look at GRAND-SLAM/grandR for such an analysis.
Re. STAR and T>C conversions: We looked into this and conducted a detailed study of spliced read mapping accuracies of nucleotide conversion reads that includes a comprehensive evaluation of HISAT-3N and STAR. HISAT-3N is indeed inert to T>C mismatches due to its 3N alignment strategy, but this reduced alphabet, in turn, also leads to reduced mappability in some situations. Overall, when considering realistic T/C conversion rates between 1-5%, both mappers performed well and there was no clear-cut winner. On some transcripts, STAR was better and on some it was HISAT-3N. Our results include detailed mappability statistics per read mapper for all annotated mouse and human GENCODE transcripts, exons, introns and splice-junctions that can be used for data filtering/cleaning tasks. The respective publication is currently under review and I can let you know once its out.
Anyway, I guess we should any further discussion of this to the rnalib space as this is not about slamdunk anymore... BW
Just as a followup, I have now commited the PairedEndIterator and extended the SLAM-seq tutorial respectively. This will be in the next build, for now you can install the latest version via pip install git+https://github.com/popitsch/rnalib.git
BW