medaka
medaka copied to clipboard
Parallelization across regions
Thank you. Would you consider adding that as a feature to have it detect a set number of regions or chromosomes and parallelize itself? Also it does not seem like medaka_haploid_variant
is able to take --regions
as an input. How should I run it independently on each chromosome?
Originally posted by @jpn2021 in https://github.com/nanoporetech/medaka/issues/263#issuecomment-793130746
@jpn2021 @kirk3gaard
It looks like an oversight that medaka_haploid_variant
doesn't take a --regions
argument like other programs. We can look at adding that.
More generally the medaka programs don't implement parallelization across chromosomes/regions for two reasons: a) most tasks are trivially parallelizable (so the programs can just be run multiple times) b) the subtleties in handling hardware resources, e.g. implementing parallelization for CPU-only settings requires a different strategy to a single- or -multi-GPU setting.
Since medaka is fundamentally a piece of algorithm research, implementing some of these niceities takes a back seat to investigating new methods. We endeavour to stick to a Unix philosophy of creating composable tools that do one job such that users can use the tools flexibly in a manner that suits their situation.