svtools icon indicating copy to clipboard operation
svtools copied to clipboard

The time and Mem needed to run speedseq SV

Open sususy opened this issue 5 years ago • 3 comments

Hi, When I used over 1000 samples to run speedseq SV, the running time varied from 2h to 36h. It turned out that those sv.vcf.gz files which needed more than 24 hours to generate had a large number of data in a particular chromosome region. However, the information in this region is not useful to me. Do you have any suggestion to reduce the running time? Like skip the certain region? Thank you!

sususy avatar Jul 30 '18 18:07 sususy

Yes, there is an exclude file that can be passed when running speedseq. This should be a BED file of regions you would like to exclude. The exclude files that we use for human data are available in the speedseq repository here: https://github.com/hall-lab/speedseq/tree/master/annotations

I believe the option you're interested in within speedseq sv is -x .

ernfrid avatar Jul 31 '18 22:07 ernfrid

Thanks a lot for your reply. That's really helpful. But I am wondering why the regions in ceph18.b37.lumpy.exclude.2014-01-15.bed were picked up? Or they are just examples? Thank you!

sususy avatar Aug 03 '18 15:08 sususy

Take a look at the annotations section of the speedseq README: https://github.com/hall-lab/speedseq#annotations

ernfrid avatar Aug 26 '18 14:08 ernfrid