node-lmdb
node-lmdb copied to clipboard
basic operations fail under WSL (windows subsystem for linux)
We experienced a persistent failure when running node-lmdb under WSL: the "Windows Subsystem for Linux" thing that gives you a linux-ish kernel and an Ubuntu (or other) userspace. A basic operation, just open a DB and then write out a key, would fail with a funny error like this:
Error: MDB_BAD_TXN: Transaction must abort, has a child, or is invalid
My coworker @FUDCo tracked it down to LMDB expecting certain data-synchronization behavior between ~two filehandles mmap'ed to the same backing file~ mmap
-based access and normal write()
-based access to the same fd, which appears to behave differently through WSL's linux-ish kernel and the working systems (it worked fine on macOS and real linux). He also found a one-line fix in the node-lmdb configuration that tells LMDB to not rely upon that behavior: add useWritemap: true
in the open()
options:
lmdbEnv.open({
path: dirPath,
mapSize: 2 * 1024 * 1024 * 1024,
useWritemap: true, // fixes crash
});
I don't know if there's anything special that node-lmdb (or LMDB itself) should do to address this, but I figured I'd mention it here in case anyone else runs into this problem in the future. Chip's full writeup is at https://github.com/Agoric/agoric-sdk/issues/950#issuecomment-618197120
(edited to correct description of data-sync issue)
Thank you for taking the time for this writeup. I don't personally use either Windows, or WSL these days, so I can't offer advice, other than a suggestion to report this problem to the upstream LMDB developers such as @hyc ― or the WSL developers.
I don't believe you should need the useWritemap
option just to work around an operating system defect.
The LMDB design document specifies that it requires a cache-coherent filesystem. Using writemap is the usual workaround though; it's also needed on OpenBSD for the same reason.
Is there any non-horrible way to test this at startup, and maybe print a warning message if it looks like we're on a non-coherent platform? Chip said one side did a write()
, and the other read from memory, and got zeros instead of the data that was written.. is there an easy way to read from memory just after doing the write (and sync) and see if the results make sense? Might help guide folks towards the workaround more quickly.
(I suppose any such diagnostic might want to go into LMDB proper, rather than the node bindings, but it'd be more actionable if the message can print the exact option you can use to work around the platform limitation, and that option might be spelled/formatted differently depending upon the binding you're using. So maybe putting it in node-lmdb wouldn't be the worst choice)
If you do come up with a reliable way to test this, or query it from the system, then I'd be happy to include that in node-lmdb. I still believe that this should be reported as a bug to the WSL team, though.
Agreed, this is a WSL bug. They've known about it for a long time already https://twitter.com/hyc_symas/status/1013800256088215552
It was working before Windows 10 update 1803, as documented here https://github.com/nimiq/core-js/issues/387
It'd be super-helpful if you could please file an issue in the WSL repo where the WSL devs are most active and where issues in WSL are triaged and tracked. Please include the simplest possible repro case, and provide as much info as possible re. OS & tools versions you're running, etc.
Many thanks.
https://github.com/microsoft/wsl
I am using WSL 2 with node-lmdb & am not seeing any issues. It appears this may now be fixed.
Currently running into this issue using WSL1 on W10, has anyone figured out a fix?
There is no fix for WSL1 other than to switch to WSL2.
Yeah, WSL1 has devastating and unpredictable bugs that can't be worked around. I was just running LMDB on WSL 2 this morning, and it works great.
Ah that's unfortunate, very much appreciate the responses though thank you!