hypermerge
hypermerge copied to clipboard
Every opened hypercore keeps a file handle open permanently
This eventually exhausts all the file handles available to the process which causes your application to fail.
There's a branch of hypermerge with a simple test that exhibits the behaviour by creating 2500 hypercores in a stress test.
There are several approaches possible to fixing this. One would be to improve the logic behind streaming hypercores so that we close the file handles we don't need. Another would be to redesign the hypercore logic so we don't need thousands of cores in the first place.
The latter is probably a more sensible long-term approach but seems hard. The former is probably easier but leaves us with the associated performance problems of having to open so many little files all the time.
There's a branch of hypermerge with a simple test that exhibits the behaviour by creating 2500 hypercores in a stress test.
@pvh which branch ? Also I imagine limit on open files might be related to the hardware and what else is going on in the OS.
One other fix could be to add idling logic to https://github.com/random-access-storage/random-access-file, that is timeout after which file is closed unless any read / write operations occur during that timeout. However such fix still might fail test that opens 2500 cores, although in practice it problem might be mitigated.
The incredible majority of cores are empty and idle, so I think this is pretty safe. I also think an LRU system would be pretty effective.
@pvh you mean for open docs ? Does when doc was opened a good signal for how relevant it still is ? It feels like first doc you opened might still be getting updates but if it’s evicted you might miss those updates.
Am I misunderstanding what you mean or overlooking something ?
I guess it may not be an issue if app does put for every document on the screen, but I am under impression that is not the case right now
After thinking a bit more about it I think it would be best to create hypercore manager with fixed capacity and let it close descriptors based on usage when limit is reached.
@pvh Also does pushpin ever close any of the open documents ? If not maybe it’s best to start there ?
@RangerMauve do you know if something along the lines of what was described above already exists or being worked on in "hyperspace" ? I'd rather do not spend my time duplicating efforts if there are some.
@pvh wanna assign this issue to me ? I'd like to work on this
I am assuming this is branch mentioned in the description #54
All yours! That branch is a reproduction case, yes.
There are a few possible paths here. I wrote a shitty POC which wrapped random-access-file in a function that simply threw away the instance after every request which was insanely slow but did function correctly. The LRU cache was more difficult than I expected because, well, the API has a bunch of weird internal decisions.
It's also worth noting that at some point the APIs for this system were changed to stream blocks through one-by-one instead of using getBatch() and as a result performance was (probably) substantially degraded.
Regarding this issue, I don't recall specific solutions off the top of my head, but I'm sure folks in the kappa-core space have been thinking about it a lot more.
CC @cblgh @noffle @frando
In cabal land we haven't hit the issue (yet) of running out of fds.