dna-seq-varlociraptor
dna-seq-varlociraptor copied to clipboard
Gridss
This implements my gridss calling pipeline, which is suppose to replace delly as the structural variant caller. Gridss only calls breakends and varlociraptor does not support breakends yet. I suggest we should have both callers separately at first by this PR. I would like to implement purple and linx as well, for cnv and fusiongene calling, which both requires gridss. When varlociraptor is modified and able to handle breakends, we should remove delly in a second step.
PS: Don't change the file structure! gridss requires a very specific file structure for the assemble step!!!
Thanks a lot! Wow, this looks like quite some amount of reverse engineering work. For my understanding: what are the main advantages of breaking gridss into pieces, compared to just running gridss.sh?
gridss.sh is already broken into these pieces. But the execution is iteratively in bash. Now with each step beeing a rule, they can be performed in parallel. Also not every step requires the same amount of ressources (primarly cores, but we could also measure the amount of memory). The number of threads for each rules are the observed number of cores on one of our servers for each step. And of cause you know the benefits of using snakemake, like recalculating intermediate files and automatically updating downstream results. Additionally I recognize, that we dont need every produced metric, so we can adjust the code for the pipeline requirements (I haven't done that yet). Furthermore I think about replacing samtools by sambamba and introducing pipes to save intermediate temp files for an increased perfomance.
I would really appreciate to have this PR merged
Thanks, makes sense to me. I'd like to wait with merging until varlociraptor supports gridss output. I hope this will happen within the next two weeks or so. For readability and also testing, I wonder whether we should move this into wrappers first. The commands a really ugly.
Thinking about it, this will also enable us to work around the weird path restrictions.