byte-storage
byte-storage copied to clipboard
Synchronous in workers
@lukewagner has been asking me about this idea several times and what he'd mostly like to see is synchronous I/O in workers, to be able to emulate POSIX and get the closest to the metal and native performance.
(Might be a duplicate of #2 but I figured I'd raise a dedicated issue.)
Related Twitter discussion https://twitter.com/reillyeon/status/886982807725121536.
I guess something like:
const mmap = byteStore.mmap(name:String, {
start:Number = 0,
end:Number
});
…would make sense. It would take a write lock on start-end. Dedicated & shared worker only of course.
I'm not sure what shape mmap would be though. I'd need to do some reading.
My intuition would be for mmap to have a similar interface to ArrayBuffer – maybe even a specially backed SharedArrayBuffer? I don’t know the internals of ArrayBuffers too well, so not sure that’s possible/feasible.
Interested to hear from WASM folks: should a mmap hold a write lock on the mmaped area? Or would you need to read it from another thread while mmapping it in a worker?
Considering you can share a mmap()’d fd to multiple threads in C, I am pretty sure WASM will want to have that behavior, too.
In addition to being able to create a new AB that aliases a file's contents, it would be really nice to map a file into a subrange of an existing AB (viz., the AB of a wasm linear memory). This would minimize the amount of copying and peak vmem required to get file contents into linear memory (which is what wasm ultimately wants) and avoid creating garbage (where the garbage owns a non-trivial resource).
I think a good start might be the ability to do a read-only mapping (so semantically an eager copy, but copy-on-write (MAP_PRIVATE) under the hood). But what would be exciting, and I expect needed in the long-term for matching native performance, is a shared mutable mapping (MAP_SHARED). I'm guessing this would require both a new mutable shared object (SharedFile/Blob) and you'd have to map into a SAB. This is a bigger ask though :)
I feel like we are in a golden age where we can add synchronous primitives to workers without the horrible hacks that are required on the main thread. I also feel that this golden age is going to come to a crashing halt. As more APIs are added to workers, the interactions between those APIs increase combinatorially.
Chrome's implementation of workers already features lots of ugly proxying to the main thread. It's fair to view this as legacy cruft, but it's adding a lot of implicit synchronisation to the platform that may mask problems.
@lukewagner
it would be really nice to map a file into a subrange of an existing AB (viz., the AB of a wasm linear memory)
In this model, would the file be copied into memory, or would parts of the AB be backed by disk, and parts by memory?
I think if browsers can use the underlying OS facility to implement mmap, then the risk of deadlock will be minimised.
A concern I have is that browsers will schedule workers with the assumption that they are CPU bound. With mmap, they may actually be IO bound, and the browser won't know.
An example of why this might matter is if the browser said "ok, this ServiceWorker over here needs to be woken up, but we already have more active workers than cores, so it will just have to wait". But some of those workers are actually doing memory-mapped IO, and if the ServiceWorker had been woken up it would be have been able to do useful work immediately.
browsers will schedule workers with the assumption that they are CPU bound
A lot of early worker demos used sync XHR, so I guess workers were invented with non-CPU bounding in mind. Of course, browsers may have optimised differently.
A lot of early worker demos used sync XHR, so I guess workers were invented with non-CPU bounding in mind. Of course, browsers may have optimised differently.
Chrome still uses one OS thread per worker. But a number of factors are pushing us towards a more active approach to scheduling.
Note that due to SharedArrayBuffer you're effectively required to use a thread per worker or else you'd violate ECMAScript's agent concept.
In this model, would the file be copied into memory, or would parts of the AB be backed by disk, and parts by memory?
It depends on the implementation, of course, but the ideal would be the latter. That is, in the best case, you'd just mmap(MAP_FIXED, fd) into a subset of the AB's memory. There are a number of requirements for this to work (e.g., page alignment), so this requires a fair bit of investigation to ensure that the optimization can be reliably done.
Incidentally, this same sort of idea of mapping a fd into an AB has come up in the context of AR where you're trying to efficiently get pixel data from a camera into wasm/asm.js memory so you can run, e.g., OpenCV.js on the pixels.
That being said, while this would allow giving the full benefits of memory-mapped I/O to compiled wasm, I think the original idea of returning a buffer that is just the mmap of a file would be pretty decent too and could be useful to a higher-level stdio library.
I think the original idea of returning a buffer that is just the mmap of a file would be pretty decent too
I think this is “good enough”. There’s currently no native concept of mapping multiple ABs into one “address space”, so I’d assume that we should avoid introducing this now (and it can be built fairly easily in user-space I think).
I don't really see the use in having a synchronous persistent storage API. One of the best things about javascript (and node.js) is that we have good ways of dealing with asynchronicity where other platforms do not. Computer architectures are fundamentally asynchronous behind the scenes, and hiding that away with blocking calls makes it difficult to write systems that perform well. What I really want out of this spec is fwrite() and fread(), but asynchronous. I'm not sure where the rest of this proposal is coming from and why it invents so many complications such as streaming and mmap? I want to store data in block-aligned chunks like the underlying file system does without having to wrap that in mountains of indexedDB.
I think it might help to ground these discussions in some real projects that would see a huge perf boost with an async block storage API:
- https://github.com/mafintosh/hypercore
- https://github.com/flumedb/flumedb
These libraries need async fwrite(), fread() and nothing else because everything else can be built on top of those basic primitives.
Use cases are: (1) porting applications/libraries that are already using sync I/O and (2) exposing the fundamental perf wins of memory-mapped I/O. Sync I/O can now be emulated with a SAB + helper worker using IDB, it will just run (significantly) slower than native. Definitely adding improved async APIs makes sense and probably that's the better first step given how primitive what we have now is (especially for mutating files efficiently); I'm not commenting on priorities here. But in the limit, I think we'll want to approach native I/O performance and something in the space of what we've described above would be useful to that end.