Whippet.jl icon indicating copy to clipboard operation
Whippet.jl copied to clipboard

Optimal value for --bam-min-reads parameter

Open narges-s opened this issue 2 years ago • 1 comments

Thanks for the interesting tool! I wanted to ask for your suggestion on optimal value for --bam-min-reads parameter when working with human data. I have 8 samples (paired end RNA-seq) per group (two groups in total) and was wondering maybe I need to increase the --bam-min-reads parameter to 2 or 3?

Thanks for the help in advance!

narges-s avatar Oct 08 '21 06:10 narges-s

Hi @narges-s -- It depends on the dataset of course, and by the expression of each gene to be honest (the current system is not the best, since a higher expressed gene will have a lower relative threshold for a splice site from bam, than a lower expressed gene-- so I'm steering off it for the next version of Whippet). That said, there is a point where adding nodes from big bam files does more harm than good-- ie. I've definitely noticed that too many noisy node splits in Whippet indexes is a bad thing. If it is very important to find de novo events, but you still want high sensitivity to quantify all events, you might have to optimize for your specific dataset-- ie. find a balance that works for you. You could also try to filter the bam file a bit for recurring spliced reads, but that may be more involved than you're willing to go-- or build a separate index for each bam file, to count numbers of de novo nodes in each file, but then you'll severely limit the comparability between psi files from each. Maybe do both-- I'm not sure what your biological questions are.

I'll leave the issue open for others, but I'm not going to deal with this in Whippet v1 because v2 will allow pre-aligned reads from bam files as direct input, which solves the problem.

timbitz avatar Oct 08 '21 16:10 timbitz