microjob icon indicating copy to clipboard operation
microjob copied to clipboard

Persistent context

Open darky opened this issue 5 years ago • 7 comments

Need ability to pass some context firstly and then, it will be always available in workers pool. For example, CPU-intensive geo task - check point in polygons. Polygons so weight and every time serialize - deserialize it so expensive. Would be better to pass it firstly

await job(() => {
}, {persistentCtx: {polygons: [/* many-many polygons */]}});

And then on every job execute it always accessible:

await job(() => {
  polygons // it accessible here yet.
  
}, {data: {point: [12.3434, 56.3434]}});

darky avatar Aug 26 '19 19:08 darky

Related issue: https://github.com/wilk/microjob/issues/42

manuel-di-iorio avatar Aug 27 '19 06:08 manuel-di-iorio

https://github.com/wilk/microjob/pull/48 PR

darky avatar Aug 28 '19 15:08 darky

@darky Thanks for this issue!

Well, let me check if I got it rightly: you need a global bucket shared between worker threads to avoid multiple massive serialisations/deserialisations, correct? This could be done with SharedArrayBuffer (shared memory) by you. However, yes, it could be a useful feature to embed in microjob.

Anyway, your PR is moving the serialisation/deserialisation problem from the user to the core: https://github.com/darky/microjob/commit/67c21aec41ec0ddc3903d6f28cfaae490e41fc95#diff-c9253097723f89dd4716748fab2e00cdR108 Every time the user invokes job, the whole persistentCtx gets serialised and sent via postMessage and then deserialised from the worker thread. I think a good solution could be to pass a global shared context from an external facade, convert it to a SharedArrayBuffer and then convert it back with a proper getter from the worker. I wouldn't use the job interface to define a global context: it's ambiguous.

wilk avatar Sep 03 '19 17:09 wilk

Every time the user invokes job, the whole persistentCtx gets serialised and sent via postMessage and then deserialised from the worker thread.

It occurred once at first time, after it always available via https://github.com/darky/microjob/commit/67c21aec41ec0ddc3903d6f28cfaae490e41fc95#diff-5bfbc2def8d97c3939b537c3f6f31b3eR3

I think a good solution could be to pass a global shared context from an external facade, convert it to a SharedArrayBuffer and then convert it back with a proper getter from the worker.

Can you please provide little example, also you can close #42 via it example :)

darky avatar Sep 03 '19 18:09 darky

I wouldn't use the job interface to define a global context: it's ambiguous.

Yep, agree. Maybe better to use start function for this purpose?

darky avatar Sep 03 '19 18:09 darky

Yep, agree. Maybe better to use start function for this purpose?

In this scenario, would persistentCtx be mutable (from within a job for example)?

I have a bit of a weird use case:

  • in one job that runs every N minutes, some data is passed in via context, and the synchronous algorithm builds a sharded index based on the data, then returns it from the job to the main thread.
  • this index is stored in memory along with the data, where a synchronous search algorithm uses the index and data to compute search results.

Ideally i'd like to be able to do the following:

  • keep both index and data in persistent state of the job (mutable)
  • run the search algorithm inside of jobs, instead of in main thread as it is now

unfortunately the serialization cost is too high without persistent state, and idea the state would be mutable would be advantageous, otherwise i'd have to stop and start a new worker pool everytime i need to update the dataset.

r3wt avatar Sep 26 '19 23:09 r3wt

@r3wt #48 PR can satisfy your needs about mutation

darky avatar Sep 27 '19 18:09 darky