flux-core
flux-core copied to clipboard
kvs: uncompleted RPCs should receive ENOSYS response when module is unloaded
Problem: we ran flux module reload kvs on a live system and it seemed evident that pending commit requests did not receive an error.
I had thought that rpc_track would ensure that pending RPCs get an ENOSYS when the module handling them is unloaded but that may not be the case. It probably should be so that requests don't hang forever.
Yeah we don't do this for modules. I was confusing this with the generation of <service>.disconnect messages when a module unloads so that any services it is using are notified that it's not there anymore.
I think the rationale was that modules should do their own responding to RPCs they are tracking since they do have the opportunity to do so. Changing the name of this issue to target the KVS module.
began looking at this and I think the commit side of this is more doable b/c we "track" the requests within the transaction manager already.
However, on the lookup/get side, we do not track current requests at all, making this more annoying/difficult.
Oh this might get slightly ugly since like to "replay" lookups that are stalled for some reason. We'd have to ensure a lookup isn't already in the msglist or whatever.
Oh this might get slightly ugly since like to "replay" lookups that are stalled for some reason. We'd have to ensure a lookup isn't already in the msglist or whatever.
Doh! And stalls/replays also happen on the commit side, I didn't realize till just now
if (!(root = getroot (ctx, ns, mh, msg, NULL, &stall))) {
if (stall)
return;
goto error;
}
if (!(tr = treq_create_rank (ctx->rank, ctx->seq++, 1, flags))) {
flux_log_error (h, "%s: treq_create_rank", __FUNCTION__);
goto error;
}
Perhaps a running msglist of all uncompleted requests is probably the simplest way to handle this on both sides.
Edit: oh wait, msglist is maybe not good given how long the requests could be with the KVS. This isn't like disconnect where it's a presumed rare activity. We'll adding/removing all the time. hmmmm
hmmm should we be handling this in a lot more modules than we do? Just skimmed kvs-watch and job-info and I don't think we handle it there either.