flux-core
flux-core copied to clipboard
idea: add a method to determine the context within Flux in which a process is currently running
As noted in #3744, it would be useful to have some way for a process to determine the context in which it is running as it relates to Flux jobs, initial program, etc. Off the top of my head, I can think of a few different contexts we might want to delineate:
- Not within any Flux instance (
flux_open ()
fails withENOENT
) - Enclosing instance is the multi-user system instance (
instance-level
attribute is0
,security.owner
!= current UID,jobid
attribute not set) - Enclosing instance is a job in a foreign RM or
flux start --test-size
session, and process is running as part of initial program (same as above, butsecurity.ower
== current uid,instance-level
is 0) - Enclosing instance is a Flux job and process is running as part of initial program (
jobid
is set, `instance-level > 0) - Enclosing instance is a Flux job and process is part of a job within that instance
AFAICT, there is not a good way to easily determine the difference between 4 and 5 above. Perhaps less importantly, there is not a clean way to tell the difference between 2 and 3 either (in the case a process is running with the UID of the flux
user for example)
It might be nice if we could add a function that would return "something" to allow a process to differentiate between these different contexts. Since "context" is actually a bit of an overloaded term, we might need something different, but the only idea I've come up with so far is to have a set of named process "scopes". This should be just considered an early idea at this point and we can iterate as much as people desire, or even throw out this idea as unnecessary if it will cause too much confusion.
Here's a first cut at names for the "scopes" outlined above:
- 1: none
- 2: system
- 3,4: initial program (I suppose instance-level could be used to differentiate these two)
- 5: job
We could add an API call flux_get_process_scope(3)
which would return one of these strings, and would allow programs to alter behavior based on their current context. For the example of flux bcast
it could abort with a warning if run in job
scope since it likely doesn't make sense to run that command as a job.
A flux scope
command could simply print the result of flux_get_process_scope(3)
for use in scripts, etc.
In discussing the repercussions of our inability to determine if a process is running in the "scope" of a Flux instance or a job within a Flux instance with @ofaaland, we had the idea to use a simple environment variable set by the job shell, but cleared by the flux-broker
. Keying off this environment variable would allow flux_get_process_scope()
or similar to determine whether the scope is job
or initial program
(perhaps instance
is a better name for that one, I don't know)
This would be trivial to implement and would assist @ofaaland's use case immediately.
For now, the FLUX_KVS_NAMESPACE
environment variable could be used as a stand-in for any future environment variable, since it is set only for jobs and cleared for the initial program.
It sounds like this could be helpful. Were you thinking the prototype would be something like this?
const char *flux_get_process_scope (void)
Maybe init
would be OK as an abbreviation for initial program
? A short, one word scope would be a little nicer popping out of a flux scope
command.
Or perhaps this?
enum flux_process_scope { init, job }; enum flux_process_scope flux_get_process_scope (void);
Then the consumer can use the returned value directly in a conditional expression, and avoid bugs like strcmp(scope, "iniital").
From: Jim Garlick @.***> Sent: Tuesday, December 7, 2021 9:32 PM To: flux-framework/flux-core Cc: Faaland, Olaf P.; Mention Subject: Re: [flux-framework/flux-core] idea: add a method to determine the context within Flux in which a process is currently running (#3817)
It sounds like this could be helpful. Were you thinking the prototype would be something like this?
const char *flux_get_process_scope (void)
Maybe init would be OK as an abbreviation for initial program? A short, one word scope would be a little nicer popping out of a flux scope command.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3817*issuecomment-988513596__;Iw!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE97Lm3TUA$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB73C4727RZRS5NEXOBT4PDUP3UYJANCNFSM5BXZSGGQ__;!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE8jN4QLSg$. Triage notifications on the go with GitHub Mobile for iOShttps://urldefense.us/v3/__https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675__;!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE9cirqD9g$ or Androidhttps://urldefense.us/v3/__https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign*3Dnotification-email*26utm_medium*3Demail*26utm_source*3Dgithub__;JSUlJSU!!G2kpM7uM-TzIFchu!kqeB6bLHadlDQtoB17Hj1isA_mwW8O0UN5fck299GZ1qjfTgQS2EGBV6LE-nQRvMAA$.
The ability to distinguish between 1, 2, and 3/4 would be very useful for some workflow systems I either know about or work on directly.
apologies, what is the difference between #4 and #5 above? There's a subtlety I'm missing.
The initial program (4) is not running as a job in its instance. It's just spawned directly by the broker. If there's a FLUX_JOB_ID set in its environment, it's the job ID of the flux instance in its enclosing instance.
The job (5) on the other hand is spawned by the flux shell and has a job ID in the flux instance.
Edit: confusing hence the need for tools :-)
@garlick ahh, so basically "flux start foo.sh" vs "flux start flux mini run foo.sh"
in real world terms (4) is a batch script and associated processes (inlcluding the flux mini run
in your example) while (5) is actual parallel job tasks.
FWIW, If the documentation includes one example command or situation for each state, it might bo a long way towards helping users understand what the states are and how to use flux correctly.
From: Al Chu @.***> Sent: Monday, October 17, 2022 11:33 AM To: flux-framework/flux-core Cc: Faaland, Olaf P.; Mention Subject: Re: [flux-framework/flux-core] idea: add a method to determine the context within Flux in which a process is currently running (#3817)
@garlickhttps://urldefense.us/v3/__https://github.com/garlick__;!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-Kj2olpAJg$ ahh, so basically "flux start foo.sh" vs "flux start flux mini run foo.sh"
— Reply to this email directly, view it on GitHubhttps://urldefense.us/v3/__https://github.com/flux-framework/flux-core/issues/3817*issuecomment-1281306756__;Iw!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-Kgc3pZGBQ$, or unsubscribehttps://urldefense.us/v3/__https://github.com/notifications/unsubscribe-auth/AB73C454FYKIXHT33SAZTRTWDWLXJANCNFSM5BXZSGGQ__;!!G2kpM7uM-TzIFchu!lcdpe8S1ovt3R1NrRxsFdPVHVl0CzSxf0n4QvJ478_XSRoor0VNMKEHs-KgFFW1cuA$. You are receiving this because you were mentioned.Message ID: @.***>
slowly beginning to work on this and amongst the contexts listed above, it was hard to distinguish between a few of them in my head. As I thought about it, I think there's two different things trying to be differentiated:
-
what "flux instance" am I running under, i.e. system instance, user instance (i.e.
flux start --test-size
), job instance (i.e.flux mini submit flux start
) -
am i the initial program or a job
would two separate functions for these two things be better? I seems like we're mixing two thing together into one.
Aside, I guess for me, when I started "permutating" things i couldn't understand why the potential scopes weren't
1 - none ~~2A - enclosing instance system instance, process is initial program~~ 2B - enclosing instance system instance, process is job 3A - enclosing instance job in foreign RM / flux start --test-size, process is initial program 3B - enclosing instance job in foreign RM / flux start --test-size, process is a job 4 - enclosing instance is flux job, process is initial program 5 - enclosing instance is flux job, process is a job
I suppose 2A is only conceptually possible??? although practically stupid
Edit: Oh wait, system instance is started via systemd, so I think impossible?
would two separate functions for these two things be better? I seems like we're mixing two thing together into one.
There are already simple ways to determine if the enclosing instance is a system instance vs single-user instance, or if you are in an initial program or a job (actually we have hit a problem here since FLUX_JOB_ID
is set for the initial program, but that can be fixed).
I think the purpose of this issue is to add a single function that makes it easy for a caller to determine their rough "context", so that callers can make simple decisions with a single call to the Flux API.
Aside, I guess for me, when I started "permutating" things i couldn't understand why the potential scopes weren't
I think the initial scopes listed above were the conclusion of the particular use cases we had in mind. i.e. these were the 3 or 4 cases that were important to differentiate. There is balance between adding every permutation and keeping the call useful, i.e. we don't want every caller to have to have a long conditional to match every case where the current process is part of an initial program (i.e. batch script). It is better to IMO to keep the interface simple and cater to the common use case.
Edit: But I meant to say if we find a need to differentiate a couple other cases then that is fine too, but we should err on the side of simplicity. (e.g. I can't think of a reason a process would need to know whether it was in the "initial program" of a job that was running in a system instance, vs a single user Flux instance, vs a foreign RM, vs flux start --test-size
. The whole point of Flux is that it shouldn't matter, and if it does (i.e. you need to talk to the parent, then you can further refine by checking attributes...)
I think if we create a flux_get_process_scope()
API call, we should be sure it returns something sensible no matter where it is used. Looking over the current PR it would seem to fall short when called from a flux-proxy
environment, or from anything running as instance owner in the system instance (cron jobs, perilog scripts, rc scripts).
Also, if we need a broker connection to obtain attributes to make the determination, it seems like we should allow that to be passed in to the API call so that a user doesn't have to connect to the broker twice (assuming they want to do more fluxish stuff), but then how do we know that the broker connection is the correct one?
IMHO it might be wise at this stage to provide a flux_get_remaining_time()
call or similar, to constrain any heuristics to this one use case.
Sorry @chu11 to make this discouraging comment after a PR is already posted. I find this problem confusing to think about and the PR actually helped me make more sense of it than when we were discussing it here in the abstract.