systemd icon indicating copy to clipboard operation
systemd copied to clipboard

core: introduce "owner" concept for transient units

Open YHNdnzj opened this issue 8 months ago • 5 comments

YHNdnzj avatar Apr 18 '25 14:04 YHNdnzj

hmm, why make this specific to transient units? i mean, sure you want to use it with that, but it appears to me that there's no reason to limit it to that.

To me it appears this should be part of the StopWhenUnneeded concept, which auto-stops a unit when the last unit that has a wants/requirement dep on it goes away. I think we might want to extend that, i.e. that it would be come "stop when no req/wants unit, or when some process is up"

And then I wonder how the AddRef logic relates to all that. i.e. one is about keeping units active, the others of keeping them loaded. Maybe we should have some symmetry here?

This PR pins by process. The AddRef logic pins by bus peer address. Which makes me wonder what is right here. Pinning by process or pinning by IPC connection? Or is both right? If so keep the door open for both in the API design (even if only one is implemented right now)?

poettering avatar Apr 22 '25 14:04 poettering

hmm, why make this specific to transient units? i mean, sure you want to use it with that, but it appears to me that there's no reason to limit it to that.

To me it appears this should be part of the StopWhenUnneeded concept, which auto-stops a unit when the last unit that has a wants/requirement dep on it goes away. I think we might want to extend that, i.e. that it would be come "stop when no req/wants unit, or when some process is up"

Well, the way I see it is that transient units are indeed special here, in that they are created completely on-the-fly and mostly isolated, i.e. external. It hence makes sense to bind their lifetime to external resources (processes here). StopWhenUnneeded OTOH is very much internal to our job logic. Feels kinda weird to conflate the two concepts in my eyes.

Or put differently: units that come from fragments on disk are owned by system, and can have a huge impact on special targets and such, hence should really not be subject to external "owners" here. The process invoking StartTransientUnit somewhat matters, but not the caller of StartUnit.

And then I wonder how the AddRef logic relates to all that. i.e. one is about keeping units active, the others of keeping them loaded. Maybe we should have some symmetry here?

This PR pins by process. The AddRef logic pins by bus peer address. Which makes me wonder what is right here. Pinning by process or pinning by IPC connection? Or is both right? If so keep the door open for both in the API design (even if only one is implemented right now)?

I was wondering the same. The reason why I opted for process tracking is that run0 needs to handle bus disconnection gracefully. I do see a point in tracking this via bus, that is when --machine= pid1 can't track processes outside of container pidns. But that's still not useful in the case of run0...

YHNdnzj avatar Apr 22 '25 17:04 YHNdnzj

So I think we really should do ownersip by IPC connection, not by PID/pidfd. Simply because we want to keep the door open for remote IPC clients. i.e. I think it should be fine if a remote client pins a local unit, and a remote client will not have a trackable local PID...

poettering avatar May 14 '25 07:05 poettering

So I think we really should do ownersip by IPC connection, not by PID/pidfd. Simply because we want to keep the door open for remote IPC clients. i.e. I think it should be fine if a remote client pins a local unit, and a remote client will not have a trackable local PID...

But still, it won't be useful for run0. I think both make sense in different use cases, and adding one now doesn't block the future introduction of another...

YHNdnzj avatar May 14 '25 10:05 YHNdnzj

Hmm, why is this only allowed for transient units? I think it'd be nice to extend this to normal units too, if there is no particular technical limitation that would prevent that. For example, let's imagine that we have some service unit, and the user is allowed to start that service via polkit, and uses a process to make use of the service. KOLE would work here nicely: we start the unit, start the consumer process as the owner, and as soon as it's finished or dies or whatever, the service is stopped too. Any consideration that would apply to a transient unit applies here too.

There is though: currently our dbus methods have no way of accepting auxiliary data like this. We can of course add a couple more if there's compelling enough use case, but by far I haven't seen any (also discussed in https://github.com/systemd/systemd/pull/37180#issuecomment-2822064461). Let's leave this for later if that ever happens. Until then, it feels questionable to extend this deliberately.

YHNdnzj avatar May 16 '25 17:05 YHNdnzj