flux-core
flux-core copied to clipboard
flux-dump/fsck: improve performance w/ asynchronous calls
while working on #6589 (which code is initially copied from flux-dump) I noticed most of the calls to the content modules are synchronous. In most cases this is probably ok (I'm not sure we'd want to walk multiple directory paths in parallel, but that could be debated).
But minimally the load of every content blob in a valref could be done asynchronously/in parallel. For large valrefs, such as stdout/stderr from a job, that synchronous lookup could take a long time.
during a chat @garlick noted that this may not necessarily be a good thing. As we may run flux-dump on a live system, large parallel lookups could cause a mini-DoS.
This is in contrast to flux-fsck(1), which could benefit from a slight parallelization. As we generally will not do that on a live system and since data isn't necessarily needed, the parallelization is less of an issue.
I'd argue that the DoS is easily fixed by adding a parameter to disable the parallelism.
In the case of shutting down a large system instance, it would be a very good thing if flux dump runtime was improved.
Great point!
the async of flux-dump proved to be trickier than I'd hope. flux-dump often runs with the content module loaded, so the async calls can come back out of order (i.e. stuff cached vs not cached). So the "simple" idea would be to make an array of futures and save em off, but this comment is a bit scary
/* N.B. first attempt was to save the futures in an array, but ran into:
* flux: ev_epoll.c:134: epoll_modify: Assertion `("libev: I/O watcher
* with invalid fd found in epoll_ctl", errno != EBADF && errno != ELOOP
* && errno != EINVAL)' failed.
* while archiving a resource.eventlog with 781 entries. Instead of
* retaining the futures for a second pass, just retain references to the
* content.load response messages.
*/
Still quite doable, but just trickier than I'd hoped. We'll start w/ flux-fsck b/c that's easier.
I think as far as the N.B. comment above, if we assume it's still an issue, then instead of an array of futures, you'd want an array of response messages that you extract from their futures (then the futures are discarded).
I proposed developing a shared class for wrapping valref traversal of one key in a future. I think that would simplify both of the utilities and allow some tricky code to be shared.
This issue came up again in a meeting recently. The time to shutdown Flux on busy system instances is a pain point for sysadmins.