js-lingui icon indicating copy to clipboard operation
js-lingui copied to clipboard

lingui-extract-experimental.ts `extractFromFiles` concurrently?

Open yunsii opened this issue 2 years ago • 8 comments

Is your feature request related to a problem? Please describe.

For big project, it is too slow to extractFromFiles, after investigation, how about to make lingui-extract-experimental.ts extractFromFiles concurrently?

https://github.com/lingui/js-lingui/blob/04b7cef9876168bc376b55424ecc0b0ebde976c1/packages/cli/src/lingui-extract-experimental.ts#L73

Describe proposed solution

p-limit?

yunsii avatar Oct 30 '23 10:10 yunsii

i considered implementing a worker thread pool to do so. But for first iteration stopped as it is now. You probably a first user who really started experimenting with that and we started getting the feedback.

If you have capacity for implement worker threads i would happy to help. But for now i'm out of capacity to do so by my own.

timofei-iatsenko avatar Oct 30 '23 10:10 timofei-iatsenko

BTW p-limit doesn't really help here because while extracting we invoking Babel on a bundles, which are single big file. Babel is very CPU bound and synchronous by nature. In all bounding code there also not much async operations, so you will not benefit of running them in one node process.

timofei-iatsenko avatar Oct 30 '23 10:10 timofei-iatsenko

Thanks for you patient explanation. I'm not familier with worker threads, but very interesting of it. I'll study the theory first, glad to join the work if possible.

yunsii avatar Oct 30 '23 11:10 yunsii

FYI https://www.npmjs.com/package/jest-worker

timofei-iatsenko avatar Oct 30 '23 13:10 timofei-iatsenko

The caveat of working with workers - you don't have a shared memory between it. Treat them as few standalone nodejs programs ran by another one.

So if you want to expose something for all workers, you could not just store it in some global variable. Usually, passing data between main / child processes is done by serializing and storing in some place, and then reading and deserializing it on another side. So you could not pass from main process to child something non-serializable, say a function or class instance.

In lingui there might be few places where it's needed, and should be re-designed in a different way.

  • Passing a lingui config to child workers (config is not serializable to json, as it might have custom formatters / extractors as function). So you rather need to read config in each thread by it's own (this might bring a significant overhead!)
  • Passing a Catalog instance object, this should be just designed in diffrent way.

timofei-iatsenko avatar Oct 30 '23 13:10 timofei-iatsenko

Got it, how about make each worker to extract each entry? It seems isolated.

yunsii avatar Oct 30 '23 14:10 yunsii

In your very first message you point into the right place in sourcecode which should be parallelized. Start from there.

timofei-iatsenko avatar Oct 30 '23 15:10 timofei-iatsenko

I know that Vitest instead of using jest-worker is using Piscina https://www.npmjs.com/package/piscina which is more robust by far than jest-worker, probably could be a good addition here

semoal avatar Nov 07 '23 18:11 semoal