plyranges icon indicating copy to clipboard operation
plyranges copied to clipboard

parallelisation

Open sa-lee opened this issue 6 years ago • 2 comments

from @lawremi:

It would be interesting to explore an API based on BiocParallel; something like:

Set the BiocParallelParam: parallelize(x, param)

Specify chunking: chunk_by(x, variable) # often seqnames chunk_count(x, count) chunk_size(x, size) chunk_by_overlap(x, ranges) # like tiles

That might help with our argument about scalability.

sa-lee avatar May 08 '18 05:05 sa-lee

So I'm imagining the design of this to look fairly similar to the (maybe even inherit from) GroupedGenomicRanges class except we have two additional slots:

  • a param which is a BiocParallelParam object
  • an iterator function

The parallelize function initializes the param slot and chunk by initializes the iterator. Then the core verbs dispatch with bpiterate or bplapply over the chunks.

sa-lee avatar May 31 '18 01:05 sa-lee

Yea that sounds about right. The iterator might perform a vectorized operation up-front to find the partitions and then use those as it iterates.

lawremi avatar May 31 '18 03:05 lawremi