h5wasm
h5wasm copied to clipboard
Lazy loading in backend environment? (Deno)
I have a backend server where I want to lazy load a dataset to limit my bandwidth usage. Based on #4 I tried to implement it like so:
try {
const Modules = await h5wasm.ready;
const { FS } = Modules;
FS.createLazyFile("/", "current.h5", signedUrl, true, false);
const file = new h5wasm.File("current.h5");
console.log(file);
} catch (err) {
console.error(err);
}
My file is stored in a s3 compatible storage bucket with range request supported, but this doesn't seem to work as I get:
TypeError: Cannot read properties of null (reading 'length')
at FSNode.get [as usedBytes] (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3748767)
at Object.getattr (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3707616)
at stat (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3732696)
at Object.doStat (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3750096)
at ___syscall_fstat64 (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3752899)
at <anonymous> (wasm://wasm/00a81936:1:557751)
at <anonymous> (wasm://wasm/00a81936:1:273169)
at <anonymous> (wasm://wasm/00a81936:1:2316618)
at <anonymous> (wasm://wasm/00a81936:1:252235)
at <anonymous> (wasm://wasm/00a81936:1:295992)
I think my environment is not setup properly as it is not web-based. Is there anything else needed to configure lazy url based access in Deno (and Node)?
I'm not completely sure - the createLazyFile function seems to have been written specifically for the browser context, as it uses
new XMLHttpRequest...
in the code (see https://github.com/emscripten-core/emscripten/blob/53f661cb11ba849403c060b97208f88775484d98/src/library_fs.js#L1678C1-L1678C42)
You might be able to use this shim library to get it to work: https://www.npmjs.com/package/xmlhttprequest
That could work. Do you have an idea on how to patch it as I don't think the FS comes bundled with this lib right?
deno apparently has a solution for xhr, and I was able to get past your first error:
> let xhr = await import("https://deno.land/x/[email protected]/mod.ts")
undefined
> try {
const Modules = await h5wasm.ready;
const { FS } = Modules;
FS.createLazyFile("/", "current.h5", signedUrl, true, false);
const file = new h5wasm.File("current.h5");
console.log(file);
} catch (err) {
console.error(err);
}
Cannot do synchronous binary XHRs outside webworkers in modern browsers. Use --embed-file or --preload-file in emcc
undefined
More investigation required... I don't know the current status of web workers in Deno.
Hm yeah it gets through until your error. Deno has built in support for web workers, but it does not seem to make a difference when using those. I am trying to see if I can get something to work as well.
// main.ts
const worker = new Worker(
new URL("./worker.ts", import.meta.url).href,
{
type: "module",
},
);
worker.postMessage({ example: 'hello world' });
// worker.ts
self.onmessage = async (e) => {
const Modules = await h5wasm.ready;
const { FS } = Modules;
FS.createLazyFile("/", "current.h5", signedUrl, true, false);
// ^ results in the same error
const file = new h5wasm.File("current.h5");
self.close();
};
It looks like Deno doesn't support synchronous fetch/xhr even in a web worker. I think the sync flag is required for the createLazyFile implementation in Emscripten, and it's the test for that flag that is failing and throwing the current error. I don't see any way to do a synchronous fetch in Deno, and I don't see any way to do an async file read in HDF5 (without writing a new Virtual File Driver), so I'm not sure if there's an easy path forward. If you find something, please let me know!
I havn't found a way to make it work using your lib, but made a workaround using
gdal-async
. It has support for writing COG geotiff files (https://www.cogeo.org/) which also allow for on demand loading of the dataset.Then using geoblaze and georaster I am able to slice the needed values of my dataset on demand.
Nevermind it has the same issue :/
Well, there seems to be an interesting difference in the way dependencies are handled in Deno. For example importing georaster
using npm:georaster
vs https://esm.sh/georaster
result in different outcomes for worker based requests. Where the esm
version does not work, the npm:
import works fine.
By the way, I don't know if this makes sense for the lib, but also publishing on JSR might simplify the build flow for different runtimes. I am not sure since I have never used any lib from jsr but it seems cool.
Thanks for the tip... I'll check out JSR.
Is it possible the npm:
import is using a different implementation of a web worker, instead of the one distributed with Deno?
Yeah that seems like it, but I can't figure out how to change it. I've worked on another solution using some nodejs serverless functions to handle this as a service.