starlarky
starlarky copied to clipboard
Proposed Parallel Processing Interface
this issue is discussing how to expose an interface in larky for parallel processing of data. currently larky is single threaded but many files for batch processing lend themselves to parallelism.
interface for multiprocessing.map
would be something like multiprocessing.map(iterator, transformer)
where transformer
would be a lambda that takes each element along with the ctx
and return the output of the transform.
operations:
- Script:
lang: starlarky
script: |
load(@vgs/multiprocessing, 'multiprocessing')
def process(input, ctx):
result = '\n'.join(multiprocessing.map(input.split('\n'), lambda x, ctx: vault.put(x[1]))
return result, ctx
assume input
is a stream like object for sftp files or http object for http requests.
multiprocessing.map
would be some interface to some execution framework such as spark which would execute the lambda and use the number of processes that customer has provisioned.