threads.js icon indicating copy to clipboard operation
threads.js copied to clipboard

Should we support passing functions to workers?

Open andywer opened this issue 6 years ago • 25 comments

In v0.x we built the library around the idea of passing functions to the worker that would be serialized, passed as a message, deserialized and then executed in the worker.

This approach has some subtle, yet strong limitations, though. That's why v1.0 does not support passing functions anymore, in order to prevent people from running into those not-always-obvious issues.

Maybe there are use cases where one really needs to be able to pass functions to workers. Then we need to make sure, though, that the user is aware of the passed function's limitations:

  • Cannot access any variable declared outside the functions scope
  • Cannot import or require() other modules in the function (at least not in the browser when bundling)

In response to #141, #144.

andywer avatar Aug 08 '19 08:08 andywer

I'm against passing functions. It feels very messy to me. I think the ServiceWorker-like approach by loading workers as separate scripts that can be individually bundled is the right way to go. Also it feels easier this way to implement a rollup plugin that specifically supports this library in case it is needed.

philkunz avatar Aug 25 '19 14:08 philkunz

I would LOVE it if functions can be passed to the workers!!! I get that it can become a minefield for some developers...but with a giant disclaimer tag and simple notes it shouldn't be crazy.

Something simple like:

  • Don't pass functions that have global dependencies out of scope
// BAD
function queso(path, callback) {
  return fs.readFileSync(path);
}

// OKAY...
function queso(path, callback) {
  conts fs = require("fs");
  return fs.readFileSync(path);
}
  • Expect function to be unable to receive parameters by referenced - they will be serialized

luiscarbonell avatar Aug 28 '19 16:08 luiscarbonell

@luiscarbonell Thank you for your feedback! What's your use case where you would benefit so much from passing functions?

andywer avatar Aug 28 '19 16:08 andywer

TL;DR:

I have bunch of instances of a class that have their prototype chain mutated on creation. I want to thread these instances but they are uniquely different and I don't always know what the function will look like on the "other end".


I'm trying to build an AI framework liquidcarrot/carrot. At times there can be many instances of a class Network "competing" for the same goal (be a "smarter" network) - because the networks are serializable functions, I would love to pass them to a "thread".

However, if I HAVE TO use a file, then I'm forced to create a file for every network that exists.

I'm currently trying to "deal" with this issue by using cuid and creating/destroying unique files as the group of networks get better/worse.

But the overhead of creating a file every time...is a bit intense.

luiscarbonell avatar Aug 28 '19 17:08 luiscarbonell

Full disclaimer: right now I'm playing around with threads and microjob - to see which works best for my use case.

I like the backward compatibility and "universality" (i.e. browser + node) of thread; but the API for microjob is more intuitive and "applicable" to my use-case. microjob isn't backward compatible because they're using an experimental node module - they also don't run in the browser (big turn-off).

luiscarbonell avatar Aug 28 '19 17:08 luiscarbonell

How about then allowing expressing functions as string? It would leave up serialization to the user and even would allow fetching code to execute from remote sources in a very easy manner?

philkunz avatar Aug 28 '19 18:08 philkunz

That would work just fine. If I can pass a someFunction.toString(), I should be good to go 👍

luiscarbonell avatar Aug 28 '19 18:08 luiscarbonell

The more I think about this, it would actually be dope to have a static Worker.createFromFunctionString method available. I imagine doing all sorts of dynamic things with async database FaaS workloads. It would also be interesting if there was some way to shield workers from accessing network or filesystem, like deno style.

philkunz avatar Aug 28 '19 18:08 philkunz

But then again I feel like threads has a really specific usecase: Enabling crossplatform multithreading, which is basically covered feature complete by what is implemented as of now imho.

philkunz avatar Aug 28 '19 18:08 philkunz

I think callbacks could be useful. Sometimes there is information in the main thread that you don't want to send to the worker due to the cost of transmission. And maybe it is the worker who figures out what it needs to query and when, based on the job it is doing. If you could provide a callback for the purpose of querying such data it could allow for efficient communication between the worker and master.

I imagine this scenario:

import { spawn, Worker, Callback } from "threads";

function getElementBBoxById(id) {
  return document.getElementById(id).getElementBBox();
}

const obj = await spawn(new Worker("./thread.js"));

await obj.calculateOptimalLayout(someData, Callback(getElementBBoxById));

In this contrived example the worker can query the DOM using the getElementBBoxById function. I imagine that threads.js would create a proxy between the master and the worker, and provide the worker with the proxy as an async function. When the worker calls the proxy the parameters are marshaled as per normal to the master thread, the master thread then takes the parameters and calls the real callback function, it takes the result and does the marshal in reverse.

jackgeek avatar Oct 24 '19 16:10 jackgeek

Callbacks have some interesting possibilities, maybe you could allow the worker to defer callbacks so they can be processed in batches. Such as:

  // thread.js
 ...  
  const nodeBBox = new Map();
  await Callback.batch(parallel =>
    filteredNodes.forEach(
      parallel(async node =>
        nodeBBox.set(node, await getElementBBoxById(node.id))
      )
    )
  );
 ...

Here all the callback functions used (i.e. getElementBBoxById) in the Callback.batch block would be queued so they can be grouped together into the minimum number of back and forth messages to the master thread. The parallel function allows code in the defer block to signify a block of code that can be potentially grouped into one call.

Here is some (completely untested) pseudo code to illustrate how threads.js could achieve this:

async function batch(callback) {
  // Flip a switch to make all Callback proxies enqueue their
  // parameters to be sent in one grouped message, and return a promise
  // which will resolve when the response to that message comes in.
  callbacks.enableQueueing();

  const batchQueue = [];
  const parallel = executeRequest => (...args) =>
    new Promise(resolve => batchQueue.push([args, executeRequest, resolve]));

  callback(parallel);

  // repeat as many times as necessary to satisfy all the batches
  while (batchQueue.length) {
    const batch = [...batchQueue];
    // clear the queue for the next group of batched requests
    batchQueue.clear();

    batch.forEach(([args, executeRequest, resolve]) =>
      resolve(executeRequest(...args))
    );

    // process all the queued callbacks in one go
    callbacks.flush();
  }

  // Flip switch back to make all Callback queued callback calls and return
  // promises that will resolve to the defered call result
  callbacks.disableQueueing();
}

jackgeek avatar Oct 24 '19 17:10 jackgeek

Interesting idea!

Now we are pretty much getting to the point where all the expose(), Transferable(), Callback(), ... things basically justify an abstract message passing API package of their own 😉

Curious to hear somebody else's thoughts on this proposal.

andywer avatar Oct 24 '19 23:10 andywer

I was able to put something together on a feature branch: feature/callbacks

Turned out to make a major refactoring necessary, but it allows the master thread to pass callbacks to the workers and vice versa, without the limitations mentioned earlier.

By exposing the functions to be called by other threads, but still be run on the local thread (much like the existing expose()), you don't need to be careful about what variables to reference or not.

Usage would look like this:

import { spawn, Callback, Worker } from "threads"

const callback = Callback((item) => {
  return item && item.enabled
})

const worker = await spawn(new Worker("./filter-worker"))
const results = await worker("http://fetch.this/then/filter", callback)

// Clean up once you don't need the callback anymore
callback.release()

Find a working example in this test file.

andywer avatar Jun 21 '20 19:06 andywer

I find it kind of ununsual to not call Callback with new, since you are constructing something here.

philkunz avatar Jun 22 '20 12:06 philkunz

@philkunz Callback is just a function returning another function spiced up with some extra properties. There is no instantiation of any class 😉

andywer avatar Jun 22 '20 12:06 andywer

So does that mean await worker("http://fetch.this/then/filter", () => { console.log('hey there') }) would also work? Or why not use a spiced up Object called a class instead of a Function Object?

Also see: https://google.github.io/styleguide/javascriptguide.xml?showone=Naming#Naming

philkunz avatar Jun 22 '20 12:06 philkunz

I just feel capitalizing anything else than a class or maybe an interface is something I'd stumble over.

philkunz avatar Jun 22 '20 12:06 philkunz

Also I assume Callback is not stringified for the worker since its executed on the mainthread? or did I get something wrong here?

philkunz avatar Jun 22 '20 12:06 philkunz

So does that mean await worker("http://fetch.this/then/filter", () => { console.log('hey there') }) would also work?

Nope, you need to use Callback() to mark the function as a callback and make threads.js expose it to other threads. The Callback()-ed function also comes with a .release() method that you should call once you don't need the callback any longer.

I just feel capitalizing anything else than a class or maybe an interface is something I'd stumble over.

Thanks for the link. For people coming from a traditional OOP-centric world it might feel a bit weird, fair enough. But I think if I saw a lower-case callback I would assume that is the callback, not the thing that turns an arbitrary function into a multi-threading-enabled callback.

andywer avatar Jun 22 '20 12:06 andywer

Since it's just a function, maybe createCallback would be a more descriptive name for it. The capitalized version also makes me think it would be some sort of object / class.

simioni avatar Dec 18 '20 00:12 simioni

Hmm… I see what you guys mean, but for me such a basic feature feels like it should have a short, catchy name as you're likely gonna use it a lot and technically speaking you are instantiating a callback. It's just not an instantiation in the classic class-based way. Somewhat like the built-in Error which can be instantiated without new, too.

Tell you what: If another user comes and says the same thing and no one else sides with me, then let's change the API and drop the discussion 😉

@ngault Do you have thoughts on this?

andywer avatar Dec 22 '20 16:12 andywer

@andywer Question about callbacks, does this mean that if the callback is an operation that is expensive, that it would block the local thread when its being called by a worker?

simplecommerce avatar Jan 28 '21 19:01 simplecommerce

@simplecommerce Interesting question. I guess it would. My intention with that feature was to make the whole setup more generic, allowing you to implement async iterators between threads or event emitters.

Now that I look at it again, this feature would in theory allow for some real performance pitfalls if you use it for the wrong use cases: If the callback is called a lot of times in rapid succession (like in a .map() call) it could cause a massive wave of messages between threads which will likely take a toll on performance. If the callback is expensive, you are probably doing it wrong altogether as now the main thread becomes the bottleneck again as if you weren't using workers in the first place…

It might actually be a bit of a risk to give such a powerful functionality to the user as you could wreck your code's performance if you used callbacks for the wrong use cases / if you felt encouraged to write inherently slow code as it becomes feasible.

andywer avatar Jan 29 '21 16:01 andywer

@andywer Thank you for your elaborate response, I understand :)

simplecommerce avatar Jan 29 '21 16:01 simplecommerce

Since you mentioned waiting for someone else to chime in about capitalized vs not-capitalized, I just thought I'd add my two cents. Callback would actually result in more typing for me than createCallback, because everywhere that I imported it I would end up importing it as import { Callback as callback } from 'threads'; (or maybe import { Callback as createCallback } primarily because I agree with all of the reasoning in https://eslint.org/docs/rules/new-cap

jasonk avatar Apr 22 '21 21:04 jasonk