starlarky icon indicating copy to clipboard operation
starlarky copied to clipboard

Proposed Parallel Processing Interface

Open mjallday opened this issue 3 years ago • 2 comments

this issue is discussing how to expose an interface in larky for parallel processing of data. currently larky is single threaded but many files for batch processing lend themselves to parallelism.

interface for multiprocessing.map would be something like multiprocessing.map(iterator, transformer) where transformer would be a lambda that takes each element along with the ctx and return the output of the transform.

operations:
- Script:
  lang: starlarky
  script: |
    load(@vgs/multiprocessing, 'multiprocessing')
    def process(input, ctx):
      result = '\n'.join(multiprocessing.map(input.split('\n'), lambda x, ctx: vault.put(x[1]))
      return result, ctx

assume input is a stream like object for sftp files or http object for http requests.

multiprocessing.map would be some interface to some execution framework such as spark which would execute the lambda and use the number of processes that customer has provisioned.

mjallday avatar Feb 14 '21 22:02 mjallday