MethylDackel icon indicating copy to clipboard operation
MethylDackel copied to clipboard

"Invalid bounds string" with the example setting from the help

Open DaGaMs opened this issue 4 years ago • 6 comments

MethylDackel extract --help says:

--OT INT,INT,INT,INT Inclusion bounds for methylation calls from reads/pairs
                  originating from the original top strand. Suggested values can
                  be obtained from the MBias program. Each integer represents a
                  1-based position on a read. For example --OT A,B,C,D
                  translates to, "Include calls at positions from A through B
                  on read #1 and C through D on read #2". If a 0 is used a any
                  position then that is translated to mean start/end of the
                  alignment, as appropriate. For example, --OT 5,0,0,0 would
                  include all but the first 4 bases on read #1. Users are
                  strongly advised to consult a methylation bias plot, for
                  example by using the MBias program.

But when I try to use --OT 5,0,0,0 I get:

$> MethylDackel extract -q 1 -p 20 -D 250 --OT 5,0,0,0 --cytosine_report hg38_full_gatk_HPV_HBV_HCV_spike-ins.fa test.bam
Invalid bounds string, 5,0,0,0

Any idea what's going on there?

DaGaMs avatar Jun 28 '20 12:06 DaGaMs

Make sure those are regular commas and not getting converted to some weird comma-like unicode characters. That's my only guess as to what's going wrong there. Otherwise that error message should only appear if the values are negative or if there aren't 4 of them.

dpryan79 avatar Jun 28 '20 15:06 dpryan79

nope, they are definitely true commas. I reverted to 4.0 and it works, back to 5.1 or current HEAD, and it doesn't

DaGaMs avatar Jun 28 '20 21:06 DaGaMs

The 0s are not correctly handled somehow. With 4.0, this worked --OT 10,0,0,140, but in 5.1, it throws said error. However, this works: --OT 10,150,1,140.

In this context: I don't really understand the --nOT parameter. How would I tell MethylDackel to trim 10bp from the beginning of each read, whether aligned to the forward or reverse strand, and irrespective of the length of the read (they are quality trimmed so uneven length)? I thought it would be --nOT 10,0,10,0, but it seems that --OT and -nOT syntax counts the reverse read in the orientation of the ref. Annoyingly, that means I have to know the length of the reverse read to trim the "beginning" of the read?

DaGaMs avatar Jun 28 '20 21:06 DaGaMs

There can be up to 4 reverse reads in BS-seq, and --nOT 10,0,10,0 will trim off the first 10 bases of those arising from the original top strand regardless of their length. I assume the confusion is that not all reverse orientation reads arose from the same strand (everyone finds this confusing at first). You never need to know the length of a read to trim its beginning; the beginning of a read is it's 5'-most base in the orientation of its alignment (i.e., the first untrimmed base that the sequencer produced).

dpryan79 avatar Jun 29 '20 20:06 dpryan79

I think there are two confusions here 😅 I should not have mixed up two issues. One is that there seems to be a bug in the parsing of 0s in the pattern string since version 0.5. The other "issue" is simply my poor understanding of the --nOT pattern syntax, which shouldn't be here in this tread, sorry 🙈

DaGaMs avatar Jun 29 '20 21:06 DaGaMs

I'll definitely look into the parsing bug, thanks for reporting it :)

dpryan79 avatar Jun 29 '20 21:06 dpryan79