rnaseq
rnaseq copied to clipboard
Improve/add UMI deduplication metrics
Description of feature
Hello ^^ I'm having difficulties finding easy to understand stats on UMI deduplication in the outputs. It seems there is no section in the multiqc output, not even in the statistics table (where I would expect to have metrics about nb of reads before dedup, nb of reads after dedup, % duplication (from umi-tools dedup on alignements)). In the output directory, I'm also not finding a log with easy to understand metrics from umi-tools dedup. I'm probably missing something. Thanks in advance. Pierre
Since people complained about the poor performance, the generation of deduplication statistics if off by default now.
You have to set the parameter --umitools_dedup_stats
respectively umitools_dedup_stats : true
in a params file to activate that functionality.
Hi @MatthiasZepper,
I'm sorry if i wasn't clear enough in my initial message. All my comments apply to the pipeline while having activated the --umitools_dedup_stats
parameter.
In the *.umi_dedup.transcriptome.filtered.prepare_for_rsem.log
files there are no summaries with the dedup stats, and the other files are not very informative and easy to read: *.umi_dedup.sorted_edit_distance.tsv
, *.umi_dedup.sorted_per_umi_per_position.tsv
, *.umi_dedup.sorted_per_umi.tsv
. There is a real need for an easy to read and understand summary for deduplications, such as the one that can be obtained through Multiqc parsing of the UMI tools for exemple (https://github.com/MultiQC/MultiQC/pull/1769).
Right now, as a user I have even less information about deduplication than what I would have in the logs just by running the umi-tools dedup command.
Apologies for stonewalling on this issue before. While hunting down the cause for issue #1303, it occurred to me that probably a botched MultiQC config is behind this issue as well. For some reason, we explicitly specify the MultiQC modules to be run and UMI-tools is nowhere to be found.
Since we run MultiQC with a custom config outside the pipeline again, we did not notice.
It should be fixed on this branch, but I struggle with testing at the moment.
#1308 has been merged to dev
and will be released as part of rnaseq 3.15. Please give it a spin to see if it solves this issue @ppericard !
@MatthiasZepper Thank you for dealing with this issue. I'm currently taking an extended leave from bioinformatics for the unforseen future. So hopefully someone from the community will be able to test this. Cheers