community icon indicating copy to clipboard operation
community copied to clipboard

WebAssembly implementation of Zarr

Open jakirkham opened this issue 7 years ago • 26 comments

Would be really nice to have a WebAssembly implementation of Zarr. This could make it possible to load Zarr files in the browser for viewing or for computation. Could also be useful to be able to work with in-memory Zarr objects in the browser. Given there has already been some good work to get Python and NumPy into the browser using Emscripten, it may be possible to just run the Python Zarr implementation in the browser. Though other compression algorithms not in the standard Python library will require getting Blosc and Numcodecs into WebAssembly. ( https://github.com/Blosc/c-blosc/issues/238 )

Note: WebAssembly is well supported. For older browsers one can convert WebAssembly to asm.js, which is pretty well supported. In the worst case, asm.js is still valid JavaScript. So can be run as JavaScript (albeit slowly).

jakirkham avatar Aug 04 '18 20:08 jakirkham

It's worth noting that Rust can compile to WebAssembly. This is a builtin feature of the Rust compiler. There is an N5 implementation for Rust, which could be a good place to start.

cc @aschampion

jakirkham avatar Aug 10 '18 17:08 jakirkham

I'm planning on creating a simple WASM-compatible flag (or new dependent crate) for the rust N5 implementation at some point. We plan to use this to improve viewing N5 volumes in CATMAID. Initially this should be as simple as adding a new backend to use the fetch API instead of filesystem and disabling compression modes that don't compile with WASM.

aschampion avatar Aug 11 '18 16:08 aschampion

Related: https://gitter.im/zarr-developers/community?at=5ddc7f1c55bbed7ade461091 https://github.com/gzuidhof/zarr.js

Not WASM but aims to make zarr files accessible from the browser

dhirschfeld avatar Dec 01 '19 02:12 dhirschfeld

Also noting there is a Scala implementation that can be compiled to JavaScript. Relevant discussion and more info in issue ( https://github.com/zarr-developers/community/issues/15 ). That said, I'm not aware of a good path from Scala to WebAssembly.

jakirkham avatar Dec 04 '19 20:12 jakirkham

I'd be interested in contributing to a rust and wasm implementation of zarr. Would anyone like to collaborate on this?

jessekv avatar Apr 07 '20 09:04 jessekv

A wasm implementation of Zarr-Blosc decompression is here:

https://github.com/Kitware/itk-vtk-viewer/blob/7f82bbff02b6e8d847c76457fc07979be07c7ad5/src/bloscZarrDecompress.js

If there is interest, this could be separated out into a new package, and the corresponding compression function added (it has already been implemented in C/Emscripten). It would make sense to use this as the decompression for JavaScript / Typescript libraries like @gzuidhof 's zarr.js or @freeman-lab 's zarr-js.

This implementation supports all blosc codec's. It also uses a pool of web workers to decompress a set of chunks in parallel and optimize wasm compilation.

Here is what it looks like in action:

https://kitware.github.io/itk-vtk-viewer/app/?fileToLoad=https://thewtex.github.io/allen-ccf-itk-vtk-zarr/average_template_50_chunked.zarr

thewtex avatar Apr 07 '20 21:04 thewtex

Here is what it looks like in action:

https://kitware.github.io/itk-vtk-viewer/app/?fileToLoad=https://thewtex.github.io/allen-ccf-itk-vtk-zarr/average_template_50_chunked.zarr

Wow, that is mega cool. What is it?

alimanfoo avatar Apr 08 '20 13:04 alimanfoo

@thewtex It would be awesome to make it a separate package. I am new to WebAssembly, but I'd be happy to contribute where I can.

jessekv avatar Apr 08 '20 14:04 jessekv

Wow, that is mega cool. What is it?

A brain atlas averaged from 1675 mice :mouse: :mouse: :mouse:

@thewtex It would be awesome to make it a separate package. I am new to WebAssembly, but I'd be happy to contribute where I can.

@vdwees Great, we'll create a package, your help is appreciated.

thewtex avatar Apr 09 '20 02:04 thewtex

I just made numcodecs.js public which has a WASM blosc codec. Hopefully this will help others use blosc in their applications!

EDIT: It's a javascript module meant to be run in the browser and Node.

manzt avatar May 26 '20 20:05 manzt

I just made numcodec.js public which has a WASM blosc codec. Hopefully this will help others use blosc in their applications!

Nice work! Thanks for sharing @manzt. I wonder how hard it is to get Zarr usable from WebAssembly then (as it is pure Python at that point)

cc @rth @mdboom (who may be interested ;)

jakirkham avatar May 26 '20 21:05 jakirkham

I wonder how hard it is to get Zarr usable from WebAssembly then (as it is pure Python at that point)

If it is pure python (and has pure python wheels) you could install it from PyPi with pyodide, but you would still need to write some code to interact with those JS/WASM libraries where currently it uses other Python package with C-extensions..

rth avatar May 27 '20 19:05 rth

Hey there, i am very interested by the zarr format so i am available to create a WebAssembly/Rust lib but i would like to directly implement the v3 spec. After reading some of the differents topic of the zarr spec repo the spec for the v3 seems pretty great. Do you think i could start some implementation ? Or should i wait for the python impl first ?

Update: Sorry but I cannot work on this project due to some terms of my employment contract 😞

Farkal avatar Oct 14 '20 22:10 Farkal

Hey @Farkal, welcome! That sounds great! 😄

Would be nice to have other people trying out the spec in other languages. This can help inform whether what we have in the spec makes sense or if it needs further modification. FWIW there is a WIP Python implementation here ( https://github.com/alimanfoo/zarrita ). Also we have been engaging with some folks from QuantStack on the C++ side and with NetCDF on the C side. It would be really interesting to see whether things makes sense from the WebAssembly/Rust side.

Also we have a weekly spec meeting details in issue ( https://github.com/zarr-developers/community/issues/33 ) if you would be interested in stopping by. Would be nice to say hi and learn a bit more about what you are working as well as how we can help 🙂

jakirkham avatar Oct 15 '20 00:10 jakirkham

FYI: zarr-python and numcodecs are compiled into WASM and available as modules in Pyodide after this PR.

Click here to try a live demo with Pyodide + zarr running completely in the browser. (Only works with Chrome or FireFox)

A next goal is to add a custom storage backend for pyodide such that we can load zarr arrays via http. However, due to the browser limitations, we cannot use fsspec with its http backend directly. To enable this, we are currently working on the asyncio event loop, and we will likely also need to wait until we have the multi-threading supported in Pyodide.

oeway avatar Jan 08 '21 14:01 oeway

Very cool! Thanks for sharing Wei 😄

cc @martindurant (who may be interested in fsspec usage)

jakirkham avatar Jan 08 '21 16:01 jakirkham

This was discussed a bit on gitter. fsspec for Pyodide seems like a big benefit with or without zarr, but indeed the sync/thread stuff adds complexity that in this environment I don't think I'm in the best place to tackle. Happy to help test, though!

martindurant avatar Jan 08 '21 18:01 martindurant

It looks like Pyodide is including Zarr & Numcodecs, which is cool to see 😄

jakirkham avatar Feb 17 '22 20:02 jakirkham

@jakirkham : where does that leave this issue? :)

joshmoore avatar Feb 18 '22 20:02 joshmoore

It looks like Pyodide is including Zarr & Numcodecs, which is cool to see 😄

I added that two libraries to pyodide a while ago, it works with in-memory data but still very limited for any real application because we cannot support remote storage backends.

Not sure if this is discussed already in the zarr community, but the key feature to make that work is to support async store (with asyncio). The native python implementation of fsspec uses threading to convert async calls into sync, but multi-threading in pyodide is not supported yet, it will only work if zarr supports async store (meaning the getitem function will be async).

oeway avatar Feb 18 '22 21:02 oeway

Working on async zarr at https://github.com/martindurant/async-zarr as part of a company hack week

martindurant avatar Aug 10 '22 14:08 martindurant

Already works in normal python asyncio, and maybe works in pyscript too, just need to write some HTML or something...

martindurant avatar Aug 10 '22 15:08 martindurant

Working on async zarr at https://github.com/martindurant/async-zarr as part of a company hack week

Thanks for sharing the details, @martindurant. May I know the exact dates for the hack week? I'd like to post publicly about this to invite more contributors.

sanketverma1704 avatar Aug 11 '22 23:08 sanketverma1704

IIUC it is a hack week Anaconda is running for its employees

jakirkham avatar Aug 12 '22 07:08 jakirkham

That's correct; and the hack is now over. I made this little video of the current state: https://drive.google.com/file/d/1Ll-Lr_3Ckf_-WIlBkIPx4H8Kmz9lz4b9/view?usp=sharing

martindurant avatar Aug 12 '22 13:08 martindurant

Thanks a lot, @martindurant. This is great. I'll share this across our social media to look at so that we can get the word out and invite new users/contributors.

sanketverma1704 avatar Aug 15 '22 22:08 sanketverma1704