threads.js icon indicating copy to clipboard operation
threads.js copied to clipboard

Can I require stuff and initialize services and objects outside the function of a function worker?

Open andrec93p opened this issue 4 years ago • 2 comments

Hello and thank you for this amazing package. I've being using it for years at this point, and it allowed for really cool things.

I have this long-running project that uses this package, and I recently updated it from version 0.8 to 1.6, doing every adjustment needed to the code. I must say I really like how the new version works, much better and cleaner than v0. I have a question that I can't find the answer to from the code and the documentation though.

Suppose I have a pool of workers, with each worker using its own instance of one or more services, like the API client for a specific library (specifically, the Firebase and APN clients to send push notifications to mobile devices). I want to initialize the service when I spawn every worker in the pool for the first time, but I want the services to be already initialized with the same instance from before when a worker is reused. Of course, whenever a worker dies and a new one spawns, the new one will have to reinitialize them and get new instances. No problem about it.

So we're talking about something close to this.

const Worker = require("threads/worker");
const someOtherLibrary = require('some-other-library');
const someConfiguration = require('./config/some-configuration');

const service1 = someOtherLibrary.initializeService1(someConfiguration);
const service2 = someOtherLibrary.initializeService2(someConfiguration);

Worker.expose(async function() {
    service1.doSomething(someConfiguration);
    service2.doSomethingElse(someConfiguration);
    ...
});

Would this work? Is it right in your vision for this package? If not, how can I correct it?

Thank you.

EDIT: I already made a few attempts and this seems to work, but I wanted to know your own thoughts about it.

andrec93p avatar May 20 '21 16:05 andrec93p

Hey @andrec93p. Sorry, but I have been sick all week.

Interesting point, now that I think about it a bit more. Technically, you can do everything perfectly fine as things are, but from a developer experience point of view it might be quite beneficial to provide some way to define a "worker constructor" as it's a pretty common and generic use case and will hopefully lead to cleaner code.

~~I don't like to have two very different ways to do an identical thing, but in this case it might be good to have a class-based and a functional way to do that.~~ Just realized that a class-based API wouldn't be any good or would need to be overly hacky as the initialization code will frequently need to be async and constructors are necessarily sync.

How about this as a first idea?

import { expose } from "threads/worker"

expose({
  async construct() {
    // …
  },

  // …
})

andywer avatar May 23 '21 13:05 andywer

Hello @andywer, don't worry, I hope you feel better now.

It's great that this works, a very interesting possibility.

As for the idea of a constructor, that looks fine by me but I'm not so sure about it. As a JavaScript developer, it feels intuitive to me to import and initialize modules and services before declaring a function to then use them in the function itself. That's why I naturally found this solution, it's just what I would normally do in Node.JS. What I currently find lacking is the documentation, which in my opinion is not clear enough about how workers are initialized. I can't deny that currently using workers in Node.JS, to me at least, is still something a little obscure and rare to do. Your module makes it very easy and natural and that's a huge plus, but I think some more clarification, even about the "obvious" things, would help a lot of developers that find themselves in the same position.

(Now that we mention documentation, I'd like some more explaining about pool events too, as I had to discover how they work myself by looking at the code. Not hard for a developer used to Node.JS, but still a bit annoying that there's no example to start from.)

About the specific proposal, using a constructor in a functional context seems a little odd to me but it might work if enough documentation is provided about it. The thing that perplexes me the most is how passing the initialized variables from the constructor to the worker function closure would work. Maybe the constructor could return an object that is then passed to the worker function as a parameter? What about the other parameters the worker function may have though? Or would the variables still need to be declared outside? A global object?

Also, the idea of a worker constructor might be confusing for developers that, like me, are working in a scenario where they initialize a pool to spawn a number of workers and then reuse them as the application runs. As obvious as it may seem, an example like that would have me wondering if the constructor is going to be called every time my worker runs or only the first time it's actually spawned. Again, more of a documentation problem than anything.

This has me thinking, it could be interesting to give more possibility to organize the code of a worker by defining more than one function, each of them called in a specific point of the lifecycle. One could, for example, have a global object worker to store variables in, and have the expose function accept just the worker function, that works the same as it does now, but also an object as a second parameter with keys that make it clear what the corresponding value does. Something like this.

import { expose } from "threads/worker"

expose(async function () {
    // This is the worker function.
    worker.service1.doSomething();
  }, 
  {
    onSpawn: async function () {
      // This is called only when the worker is first spawned.
      worker.service1 = initializeService1();
    },
    onCalled: async function () {
      // This is called every time before the worker function runs.
    },
    onFinish: async function () {
      // This is called every time after the worker function is done.
    },
    onDestroy: async function () {
      // This is called when a worker is destroyed by the main process.
    }
    // …
  });

Again though, I don't think this is a huge problem. Just updating the documentation to make clear how workers are initialized would be more than enough for many use cases in my opinion.

andrec93p avatar May 24 '21 08:05 andrec93p