ShortStack Low number of miRNAs identified

Low number of miRNAs identified

Open rmr74370 opened this issue 5 months ago • 4 comments

Hello! So I’m trying to use ShortStack for miRNA identification from sorghum roots samples. In order to test the efficiency of our small RNA library protocol we initially sequenced a small pool of 20 samples (which I will call Pool 1). When I ran ShortStack on them, they seemed to work fine and 37 miRNAs were identified. We then later wanted to test out more samples so we sequenced Pool 2 (consisting of 80 samples). However, this pool only yielded 12 miRNAs according to ShortStack. So I’m trying to figure out what might be causing this drastic difference between Pool 1 and Pool 2, especially since Pool 2 has more samples so I was expecting an equal or greater number of miRNAs to be identified.

Issue: Pool 2 has much fewer miRNAs identified than Pool 1 (12 miRNAs vs. 37) despite having more samples (80 samples vs. 20). Why?

Pool 1 –37 miRNAs Pool 2 –12 miRNAs

Sequencing read depth?

Pool 1: 20 samples; Average read depth of ~6.7 million reads (~⅕ of the samples had less than 5 million reads)
Pool 2: 80 samples; Average read depth of ~6.6 million reads (a little more than a third of the samples had less than 5 million reads); one outlier with 75 million reads (average without outlier is ~5.8 million reads).
To test if sequencing read depth played a role in why Pool 2 didn’t have as many miRNAs I ran ShortStack but filtered out lower quality reads. I also decided to run Pool 1 and 2 together, because theoretically the miRNAs identified in pool 1 should show up in the results even if they aren’t found in pool 2. Results:

Pool 1&2, more than 5 million reads: 15 miRNAs Pool 1&2, more than 5 million reads, no outlier: 10 miRNAs Pool 1&2, more than 1 million reads, no outlier: 12 miRNAs

Unfortunately, despite running pool 1 and 2 together and filtering out lower quality reads, there were still a very low number of miRNAs identified. Why?

Bad sample interfering with algorithm? Sample size?

I then wondered if maybe the difference in sample size had any impact of the algorithm (since pool 1 had 20 samples and pool 2 had 80). Additionally I wondered if there was one bad sample that was maybe formatted wrong or had some other issue that was messing up the ShortStack run somehow.
So I divided Pool 2 into sets of 20 and ran them separately. The results are below:

Pool 2, samples 1-20: 36 miRNAs Pool 2, samples 21-40: 0 miRNAs Pool 2, samples 41-60: 18 miRNAs Pool 2, samples 61-80: 10 miRNAs

I found these results interesting since the first set (1-20) had 36 miRNAs identified, which is comparable to pool 1.
The 3rd (41-60) and 4th (61-80) sets didn’t surprise me too much since I ordered the samples in order of decreasing total miRNA counts according to the results from running Pool 2 initially. So I would expect the later sets to have fewer miRNAs.
But set 2 (21-40) having 0 miRNAs identified is a little strange. So to see if there might be a problem sample mixed in somewhere I further subdivided it into four subsets of 5 samples each. The results are below:

Pool 2, samples 21-40, subset 1 (5 samples): 33 miRNAs Pool 2, samples 21-40, subset 2 (5 samples): 30 miRNAs Pool 2, samples 21-40, subset 3 (5 samples): 30 miRNAs Pool 2, samples 21-40, subset 4 (5 samples): 28 miRNAs

These results are a little confusing to me since they all seem fine. So why did running samples 21-40 together cause issues, but running them in sets of 5 was fine?

Additional Notes: I’ve been using version 3.8.5 since a labmate of mine used that version and I wanted to keep my results comparable to his. But I could switch to the most recent version if you think that would help. I’ve also been using all the defaults, though I have considered changing the --mincov to be something like 0.5 to increase sensitivity. I’ve also been using a Conda environment as well as the same script (just modifying which input samples) for all of the runs.

Do you have any ideas on why Pool 2 doesn’t seem to be working normally? Any help would be greatly appreciated. Thanks!

Sep 06 '24 16:09 rmr74370

ShortStack ShortStack copied to clipboard

Low number of miRNAs identified

ShortStack
ShortStack copied to clipboard