fgbio icon indicating copy to clipboard operation
fgbio copied to clipboard

The minimum number of reads supporting a consensus base/read.

Open asmlgkj opened this issue 2 years ago • 6 comments

Dear professor, thanks a lot for this great tool. in the command FilterConsensusReads. there is a argument -M

-M Int{1..3}, --min-reads=Int{1..3} The minimum number of reads supporting a consensus base/read.

if I set - M 1, does it mean the single reads (singletons) can also be used even in a library of R1 R2 both have umi?

and I also found -M can be set as -M 2 1 1. how will this work?

asmlgkj avatar Apr 25 '22 11:04 asmlgkj

To add to this I noticed that --min-reads 0 0 0 retains slightly more reads than --min-reads 1 0 0. Why is this the case?

dennis-serum avatar Apr 26 '22 14:04 dennis-serum

@dstephensSD how do u understand the --min-reads 1 0 0 --min-reads 0 0 0 -M 2 1 1

asmlgkj avatar Apr 26 '22 14:04 asmlgkj

@asmlgkj

I am assuming duplex sequencing below (observe both strands).

In general, the -M option works as -M X Y Z, where X is total consensus depth, Y is the depth of the strand with higher depth, and Z is the depth of the strand with lower depth. When only X is given, then Y and Z are set to X.

  1. So -M 1 is equivalent to -M 1 1 1. This requires that each strand have at least 1 read, and total depth to have 1 (which is always the case if each strand has depth 1).
  2. -M 2 1 1 means that each strand must have at least 1 read, and the total depth must be 2.
  3. -M 1 0 0 means that the total depth must be 1, while each strand may have zero depth. Due to the first 1, this means that at least one of the two strands must have at least depth 1.
  4. -M 0 0 0 is non-sensical, since if you had no reads observe the consensus, it wouldn't exist. So basically it turns the filtering off.

@dstephensSD in general, it is preferable to open a new issue unless it is directly related. Could you please open a new issue?

nh13 avatar Apr 26 '22 15:04 nh13

@nh13 Thanks a lot so if I set -M 3, it means -M X Y Z is -M 3 3 3, so it actually the X is 6 (because X is the sum of Y + Z)?

in the command CallMolecularConsensusReads, there is also an argumennt -M (The minimum number of reads to produce a consensus base.), and FilterConsensusReads (The minimum number of reads supporting a consensus base/read.) what is the difference?, in my mine, it is the same, and the following FilterConsensusReads is a step to avoid re-CallMolecularConsensusReads with a higher -M argument.

single-strand consensus sequence (SSCS). duplex consensus sequences(DCSs) I am here also wanting to whether CallDuplexConsensusReads just use reads with duplex reads (the efficiency of DCS recovery from SSCSs is poor)? and CallMolecularConsensusReads uses both SSCS and DCSs?

in both SSCS or DCSs can there exists a umi differnece when group by umi?

asmlgkj avatar Apr 27 '22 00:04 asmlgkj

As a heads up, I’ll be slow to respond to prioritize client work. Thanks for your understanding.

nh13 avatar Apr 27 '22 01:04 nh13

yes, thanks for your kind help, wish your work can be processed more smoothly

asmlgkj avatar Apr 27 '22 01:04 asmlgkj