job-info: update watchers lists may not scale well
In #6709, it's noted that job-info cancellation lists can be slow if there are alot of watchers. This mostly scales poorly with number of total job submissions and things like that (i.e. the total number of thingies watching eventlogs and what not).
update watchers in job-info are different, they really only scale poorly with total number of running jobs (i.e thingies running right now).
This area requires a bit more re-work if we want to scale it out, the msglist should become a zlistx_t and we want to probably add "matchtag lookup keys" to some struct similar to what was done in #6709.
This appears to be a lower priority scaling issue so I'm separating it out to potentially be done at a later time if it becomes a problem.
There are two "update watch" loops of concern at scale
1 - update watch - loop over all namespace-key combos 2 - update watch - loop over all the watchers of a specific namespace-key combo
the first is only an issue with large number of namespaces, i.e. large number of jobs the second is presumably only an issue with very large jobs, i.e. many job shells watching for updates for jobspec or R
note that these are somewhat orthogonal to each other
having many namespace-key combos probably means high throughput jobs (i.e. tons of small ones)
having many watchers of a specific namespace-key combo probably means a big job (i.e. a single job running on tons of cores)
so improving performance of one of these loops for their specific case, probably doesn't help with the other case
in a bit of a surprise, implementing another uuid+matchtag hash similar to what was done in #6695 and #6709 did not initially appear to help. The realization was that the major user of job-info.update-watch in the shell did not actually cancel the watcher.
Adding ...
--- a/src/shell/info.c
+++ b/src/shell/info.c
@@ -332,7 +332,20 @@ void shell_info_destroy (struct shell_info *info)
{
if (info) {
int saved_errno = errno;
- flux_future_destroy (info->R_watch_future);
+ if (info->R_watch_future) {
+ flux_t *h = flux_future_get_flux (info->R_watch_future);
+ flux_future_t *f;
+ f = flux_rpc_pack (h,
+ "job-info.update-watch-cancel",
+ FLUX_NODEID_ANY,
+ FLUX_RPC_NORESPONSE,
+ "{s:i}",
+ "matchtag",
+ flux_rpc_get_matchtag (info->R_watch_future));
+ if (!f)
+ shell_log_error ("job-info.update-watch-cancel failed");
+ flux_future_destroy (info->R_watch_future);
+ }
json_decref (info->R);
jobspec_destroy (info->jobspec);
rcalc_destroy (info->rcalc);
did seem to improve things ever so slightly.
while adding a hash to the job-info update-watch code appears to be a small net win, it is small enough to be questionable. Given the fact that the job shell doesn't even call cancel and the existence of #6721, it's a debatable win. Perhaps in the future if more update-watcher keys are supported (more than just R and jobspec) perhaps this will change.
For now, I will park the work here for consideration down the road
https://github.com/chu11/flux-core/tree/issue6990_kvs_watch_part_4
Shouldn't the watch be canceled by the disconnect when the shell exits and closes its handle?
Shouldn't the watch be canceled by the disconnect when the shell exits and closes its handle?
Yes, perhaps I should have said it is not canceled "directly" but via the disconnect. B/c disconnects don't have matchtags, those can't be cleaned up quickly, the disconnects just loop through everything and see what should be cleaned up (see #6721).
Ah, ok. That makes sense now :-)
I wonder if there's a way to fix the disconnect case. There's probably lots of cases of that lingering around...
yeah, when i discovered it I realized "oh crap, we could improve performance by NOT canceling watchers" .... to some loose extent the work to solve #6695 and #6709 made two iterations over watchers into 1 iteration. Since the update watcher for R in the job-shell was only doing 1 iteration to begin with, the addition of the cancel actually was a net loss.