migec icon indicating copy to clipboard operation
migec copied to clipboard

Oversequencing threshold for samples in a same run

Open fabio-t opened this issue 3 years ago • 0 comments

Following the documentation, I usually choose a single, "reasonable" oversequencing threshold from those reported by MIGEC in the histogram step of the various sample:

In most cases, the automatic MIG size threshold selected by Histogram routine is ok. However we strongly recommend manual inspection of Histogram output files and considering to manually specify an appropriate MIG size threshold for input samples. Our experience also shows that it is a good practice to set an identical size threshold for all samples in a batch.

So for example, if I have samples with an estimated oversequencing threshold ranging from 1 to 6, I may end up choosing 3 or 4.

  • if I choose 1, I get a huge amount of singletons for the oversequenced samples
  • if I choose 6, I lose a huge amount of clones in the undersequenced samples
  • in the end, any threshold chosen feels arbitrary but a middle ground works "well enough"

So my question is: is that quote above still the recommended for MIGEC, or am I better served by correcting each sample according to its specific oversequencing level?

Would you, otherwise, suggest a different approach, eg subsampling UMIs manually based on known product or something along those lines?

fabio-t avatar Jun 04 '21 16:06 fabio-t