svtools
svtools copied to clipboard
The time and Mem needed to run speedseq SV
Hi, When I used over 1000 samples to run speedseq SV, the running time varied from 2h to 36h. It turned out that those sv.vcf.gz files which needed more than 24 hours to generate had a large number of data in a particular chromosome region. However, the information in this region is not useful to me. Do you have any suggestion to reduce the running time? Like skip the certain region? Thank you!
Yes, there is an exclude file that can be passed when running speedseq. This should be a BED file of regions you would like to exclude. The exclude files that we use for human data are available in the speedseq repository here: https://github.com/hall-lab/speedseq/tree/master/annotations
I believe the option you're interested in within speedseq sv
is -x
.
Thanks a lot for your reply. That's really helpful. But I am wondering why the regions in ceph18.b37.lumpy.exclude.2014-01-15.bed were picked up? Or they are just examples? Thank you!
Take a look at the annotations section of the speedseq README: https://github.com/hall-lab/speedseq#annotations