svaba
svaba copied to clipboard
Question about SvABA parameter, "--chunk-size"
Hi,
I read your explanation about "--chunk-size" in Github issue, "Parameters for max reads and chunk size #68" You explained "Max chunk also just changes the size of the anchor window." (The old name of the parameter is "max_chunk", right?) And, the parameter definition says "Size of a local assembly window (in bp). Set 0 for whole-BAM in one assembly." I like to know how this --chunk-size will affect the detection of structural variants, such as exon duplication. Could you give some advice whether I should set --chunk-size 0 or big number like 30000 ?
I read the parameter definition for '--read-tracking' and '--error-rate'
--read-tracking Track supporting reads by qname. Increases file sizes. [off] -e, --error-rate Fractional difference two reads can have to overlap. See SGA. 0 is fast, but requires error correcting. [0]
but hard to understand the meaning and can't decide what value I should use to detect exon duplication. I tested by setting -error-rate 0.5 and 0.7 but SvABA stopped in the middle. Could you give some advice?
Thank you, Sanghoon
I got answers from the developer, Jeremiah Wala, via email. I attached the answer below for other people. I appreciate Jeremiah's kind explanation. ##################################################
- chunk size : i wouldn't touch this parameter. Unless you BAM file is like 100 Mb, you don't want to set it to 0. It's a memory management issue, and shouldn't affect sensitivity
- read-tracking - this just makes the output files much bigger by saying the names of the reads associated with each variant. It doesn't affect performance.
- error-rate: this is another parameter that you probably should not change. The default is to do error-correction during the assembly, which is the right approach.
Overall, you should be able to use svaba to detect exon duplication and deletion in whole-genome sequencing. It won't work in whole-exome though, since you won't be able to see the breakpoints.