medaka icon indicating copy to clipboard operation
medaka copied to clipboard

Parallelization across regions

Open cjw85 opened this issue 3 years ago • 1 comments

Thank you. Would you consider adding that as a feature to have it detect a set number of regions or chromosomes and parallelize itself? Also it does not seem like medaka_haploid_variant is able to take --regions as an input. How should I run it independently on each chromosome?

Originally posted by @jpn2021 in https://github.com/nanoporetech/medaka/issues/263#issuecomment-793130746

cjw85 avatar Mar 24 '21 14:03 cjw85

@jpn2021 @kirk3gaard

It looks like an oversight that medaka_haploid_variant doesn't take a --regions argument like other programs. We can look at adding that.

More generally the medaka programs don't implement parallelization across chromosomes/regions for two reasons: a) most tasks are trivially parallelizable (so the programs can just be run multiple times) b) the subtleties in handling hardware resources, e.g. implementing parallelization for CPU-only settings requires a different strategy to a single- or -multi-GPU setting.

Since medaka is fundamentally a piece of algorithm research, implementing some of these niceities takes a back seat to investigating new methods. We endeavour to stick to a Unix philosophy of creating composable tools that do one job such that users can use the tools flexibly in a manner that suits their situation.

cjw85 avatar Mar 24 '21 14:03 cjw85