geotiff.js
geotiff.js copied to clipboard
Threads.js Bundling
I have been thinking more about this and I realized that I don't want to ask upstream-users of my library to bundle the workers from threads.js via webpack or parcel. It seems that there isn't an accepted way to write a "library" using this (see https://github.com/andywer/threads.js/issues/211 and https://github.com/andywer/threads.js/issues/232#issuecomment-613159302) since you can't get a single bundle easily. If I have this wrong, I'd love to know how to get this to work, but it seems that there isn't an easy way to publish a library that uses threads.js without having upstream users handle the implicit code-splitting. Again, if I have this wrong, please let me know - despite the bundling, threads.js is far easier to understand/use!
I reverted to using the old web workers from an old release of GeoTIFF in my application: https://github.com/hubmapconsortium/vitessce-image-viewer/pull/160. In my testing via npm pack
I don't seem to have any issues with the bundling and it works well. I understand the ease of parcel as well as threads.js as opposed to using webpack/webworkers but I just wanted to leave this here for anyone else who might encounter this, or if changes are desired, to help start a roadmap for them.
CC: @PacoDu @constantinius
Hi @ilan-gold
With the previous "solution", i.e webpack, we had a very similar problem. It required users to use the "worker-loader" plugin with a weird configuration to work. The underlying issue is that WebWorkers are weird to work with, especially with bundling tools.
I'm sorry this is now causing issues for you. Currently I have no clear and easy remedy, nor a roadmap to fix this.
FWIW @constantinius I never had that issue in the browser (having to bundle with worker-loader
) with either my bundle usinggeotiff
or upstream applications that used my own bundle when it used geotiff
. I also don't see that problem now that I am rolling my own webworkers. The inlining that webpack does seems to work (even though webpack produces a separate bundle, it does not seem to be used) and the performance is outstanding. It's possible node is different. It's also possible I am not actually using a pool of workers like I think I am, but the zoom performance I get for large scale imagery is near perfectly smooth which seems to suggest that processing is happening off the main thread. I guess there could be a worker-loader
hidden away somewhere I don't know about, but I was bundling my application with rollup
for a while when you had web workers here in geotiff
with no issue.
Hi. I'm by no means an expert in web workers, threads, etc., but wanted to share some notes on how we got the new version of geotiff.js to work with GeoRaster which uses Webpack.
We run a script to replace the parcel-friendly code in the geotiff.js source code import 'threads/register';
with the Webpack-friendly import { Worker } from 'threads';
(https://github.com/GeoTIFF/georaster/blob/master/package.json#L21) before running npm run build
.
In our webpack.config.js we use the ThreadsPlugin, set externals['tiny-worker'] = 'tiny-worker'
on node builds (https://github.com/GeoTIFF/georaster/blob/master/webpack.config.js#L29) and set node['threads/register'] = 'empty';
(https://github.com/GeoTIFF/georaster/blob/master/webpack.config.js#L34)
The underlying issue is that WebWorkers are weird to work with, especially with bundling tools.
I couldn't agree more! :-)
Perhaps we could work together on a library that solves these issues at a more fundamental level? Are there any examples of libraries that try to bring threads and worker-loader together similar to how cross-fetch made it easier to use whatwg-fetch and node-fetch? Happy to pitch in and open to your ideas :-)
Hey, I agree that web worker are difficult to bundle and I currently don't know what is the best option. I wasn't pleased to introduce the ThreadPugin dependency 😕, I agree that this is not ideal.
I think you only need the ThreadPlugin if you want to build from source, if you import the prebuilt worker in the dist-node or dist-browser you don't have to rebuild the worker with ThreadPlugin (it would require some tests to confirm that this is true). For example building a NodeJS application should not require the ThreadPlugin import. But most of the commonly used frontend framework (React, VueJS..) use webpack and target the module
entry in the package.json of the library, if available.
Maybe we could address this issue by providing a 3rd bundle that compile the module in ESM format and apply the changes done by the ThreadPlugin, and therefore avoid the upstream dependency and also prevent collision with the client if he uses raw WebWorkers or an other library that overwrite the Worker constructor like threads.js. But parcel is too limited to tweak the build in that way, so we should probably try to do these builds with webpack or rollup. (https://github.com/purtuga/esm-webpack-plugin)
We run a script to replace the parcel-friendly code in the geotiff.js source code
import 'threads/register';
with the Webpack-friendlyimport { Worker } from 'threads';
(https://github.com/GeoTIFF/georaster/blob/master/package.json#L21) before runningnpm run build
.
I think this workaround is not necessary with the latest version :D (#145, #144). Could you confirm ?
@PacoDu This was not my experience with threads.js - I needed to have the plug-in serve the worker file out of my node_modules
folder. I think that is what we determined was the way forward in #143. Maybe you could help, but what was wrong with the Web Workers bundling we had before? I never fully understood that.
I wonder if an option in this space is this library - https://github.com/developit/web-worker
@ilan-gold sorry for not answering earlier, If I recall correctly the worker was simply not compatible with NodeJS
@rowanwins thanks for pointing to this library ! I took a quick look and I see a really interesting point in the documentation of web-worker: const url = new URL('./worker.js', import.meta.url);
https://github.com/developit/web-worker#usage-example
👉 Notice how new URL('./worker.js', import.meta.url) is used above to load the worker relative to the current module instead of the application base URL. Without this, Worker URLs are relative to a document's URL, which in Node.js is interpreted to be process.cwd().
I think this could solve our issue of serving the web-worker to frontend development servers. And if this is compatible with threadjs we could keep the nice interface that threadjs offers. Or switch to web-worker if it doesn't work with threadjs
No worries. That package looks very interesting. I had no idea about the Node usage. Still, for the browser, I think we'd want to do inline bundling since upstream applications would not want to worry about dealing with that. If this handles that cleanly, though, even without inline bundling, then I think this is a good fix. FWIW I think that keeping the Pool
API exposed is good so that you can use custom decompression when parsing the fileDirectory
.
Hey there! I don't know if you saw it already or if it's still an urgent topic for you, but threads.js now supports inlining worker code using BlobWorker, so you can ship master and worker code together in a single module.
Only gotcha right now: When bundling you need to run webpack twice. First bundle the worker(s) and then run webpack again for the main entrypoint code.
@andywer thanks for the notice! I think we now use Parcel for bundling.
@PacoDu Could you have a look at this to potentially use this for our case?
Hey, @andywer thanks ! @constantinius Sure, this should solve our issue. I can't work on this right now but I'll try to take a look in the coming weeks.
Thanks, much appreciated