h5wasm icon indicating copy to clipboard operation
h5wasm copied to clipboard

Lazy loading in backend environment? (Deno)

Open aLemonFox opened this issue 11 months ago • 10 comments

I have a backend server where I want to lazy load a dataset to limit my bandwidth usage. Based on #4 I tried to implement it like so:

try {
  const Modules = await h5wasm.ready;
  const { FS } = Modules;

  FS.createLazyFile("/", "current.h5", signedUrl, true, false);
  const file = new h5wasm.File("current.h5");

  console.log(file);
} catch (err) {
  console.error(err);
}

My file is stored in a s3 compatible storage bucket with range request supported, but this doesn't seem to work as I get:

TypeError: Cannot read properties of null (reading 'length')
    at FSNode.get [as usedBytes] (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3748767)
    at Object.getattr (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3707616)
    at stat (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3732696)
    at Object.doStat (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3750096)
    at ___syscall_fstat64 (file:///.../AppData/Local/deno/npm/registry.npmjs.org/h5wasm/0.7.1/dist/esm/hdf5_util.js:8:3752899)
    at <anonymous> (wasm://wasm/00a81936:1:557751)
    at <anonymous> (wasm://wasm/00a81936:1:273169)
    at <anonymous> (wasm://wasm/00a81936:1:2316618)
    at <anonymous> (wasm://wasm/00a81936:1:252235)
    at <anonymous> (wasm://wasm/00a81936:1:295992)

I think my environment is not setup properly as it is not web-based. Is there anything else needed to configure lazy url based access in Deno (and Node)?

aLemonFox avatar Mar 04 '24 14:03 aLemonFox

I'm not completely sure - the createLazyFile function seems to have been written specifically for the browser context, as it uses new XMLHttpRequest... in the code (see https://github.com/emscripten-core/emscripten/blob/53f661cb11ba849403c060b97208f88775484d98/src/library_fs.js#L1678C1-L1678C42)

You might be able to use this shim library to get it to work: https://www.npmjs.com/package/xmlhttprequest

bmaranville avatar Mar 04 '24 15:03 bmaranville

That could work. Do you have an idea on how to patch it as I don't think the FS comes bundled with this lib right?

aLemonFox avatar Mar 11 '24 19:03 aLemonFox

deno apparently has a solution for xhr, and I was able to get past your first error:

> let xhr = await import("https://deno.land/x/[email protected]/mod.ts")
undefined
> try {
  const Modules = await h5wasm.ready;
  const { FS } = Modules;

  FS.createLazyFile("/", "current.h5", signedUrl, true, false);
  const file = new h5wasm.File("current.h5");

  console.log(file);
} catch (err) {
  console.error(err);
}
Cannot do synchronous binary XHRs outside webworkers in modern browsers. Use --embed-file or --preload-file in emcc
undefined

More investigation required... I don't know the current status of web workers in Deno.

bmaranville avatar Mar 11 '24 21:03 bmaranville

Hm yeah it gets through until your error. Deno has built in support for web workers, but it does not seem to make a difference when using those. I am trying to see if I can get something to work as well.

// main.ts
const worker = new Worker(
  new URL("./worker.ts", import.meta.url).href,
  {
    type: "module",
  },
);
worker.postMessage({ example: 'hello world' });
// worker.ts
self.onmessage = async (e) => {
  const Modules = await h5wasm.ready;
  const { FS } = Modules;

  FS.createLazyFile("/", "current.h5", signedUrl, true, false);
  // ^ results in the same error
  const file = new h5wasm.File("current.h5");
  self.close();
};

aLemonFox avatar Mar 12 '24 16:03 aLemonFox

It looks like Deno doesn't support synchronous fetch/xhr even in a web worker. I think the sync flag is required for the createLazyFile implementation in Emscripten, and it's the test for that flag that is failing and throwing the current error. I don't see any way to do a synchronous fetch in Deno, and I don't see any way to do an async file read in HDF5 (without writing a new Virtual File Driver), so I'm not sure if there's an easy path forward. If you find something, please let me know!

bmaranville avatar Mar 13 '24 12:03 bmaranville

I havn't found a way to make it work using your lib, but made a workaround using gdal-async. It has support for writing COG geotiff files (https://www.cogeo.org/) which also allow for on demand loading of the dataset.

Then using geoblaze and georaster I am able to slice the needed values of my dataset on demand.

Nevermind it has the same issue :/

aLemonFox avatar Mar 16 '24 15:03 aLemonFox

Well, there seems to be an interesting difference in the way dependencies are handled in Deno. For example importing georaster using npm:georaster vs https://esm.sh/georaster result in different outcomes for worker based requests. Where the esm version does not work, the npm: import works fine.

By the way, I don't know if this makes sense for the lib, but also publishing on JSR might simplify the build flow for different runtimes. I am not sure since I have never used any lib from jsr but it seems cool.

aLemonFox avatar Mar 16 '24 16:03 aLemonFox

Thanks for the tip... I'll check out JSR.

bmaranville avatar Mar 16 '24 23:03 bmaranville

Is it possible the npm: import is using a different implementation of a web worker, instead of the one distributed with Deno?

bmaranville avatar Mar 18 '24 14:03 bmaranville

Yeah that seems like it, but I can't figure out how to change it. I've worked on another solution using some nodejs serverless functions to handle this as a service.

aLemonFox avatar Apr 04 '24 18:04 aLemonFox