dna-seq-varlociraptor icon indicating copy to clipboard operation
dna-seq-varlociraptor copied to clipboard

Gridss

Open christopher-schroeder opened this issue 4 years ago • 5 comments

This implements my gridss calling pipeline, which is suppose to replace delly as the structural variant caller. Gridss only calls breakends and varlociraptor does not support breakends yet. I suggest we should have both callers separately at first by this PR. I would like to implement purple and linx as well, for cnv and fusiongene calling, which both requires gridss. When varlociraptor is modified and able to handle breakends, we should remove delly in a second step.

PS: Don't change the file structure! gridss requires a very specific file structure for the assemble step!!!

christopher-schroeder avatar Mar 25 '20 12:03 christopher-schroeder

Thanks a lot! Wow, this looks like quite some amount of reverse engineering work. For my understanding: what are the main advantages of breaking gridss into pieces, compared to just running gridss.sh?

johanneskoester avatar Mar 27 '20 20:03 johanneskoester

gridss.sh is already broken into these pieces. But the execution is iteratively in bash. Now with each step beeing a rule, they can be performed in parallel. Also not every step requires the same amount of ressources (primarly cores, but we could also measure the amount of memory). The number of threads for each rules are the observed number of cores on one of our servers for each step. And of cause you know the benefits of using snakemake, like recalculating intermediate files and automatically updating downstream results. Additionally I recognize, that we dont need every produced metric, so we can adjust the code for the pipeline requirements (I haven't done that yet). Furthermore I think about replacing samtools by sambamba and introducing pipes to save intermediate temp files for an increased perfomance.

christopher-schroeder avatar Mar 27 '20 23:03 christopher-schroeder

I would really appreciate to have this PR merged

christopher-schroeder avatar Apr 01 '20 11:04 christopher-schroeder

Thanks, makes sense to me. I'd like to wait with merging until varlociraptor supports gridss output. I hope this will happen within the next two weeks or so. For readability and also testing, I wonder whether we should move this into wrappers first. The commands a really ugly.

johanneskoester avatar Apr 06 '20 15:04 johanneskoester

Thinking about it, this will also enable us to work around the weird path restrictions.

johanneskoester avatar Apr 06 '20 15:04 johanneskoester