fibers
fibers copied to clipboard
thread-local fluids don't work like they're supposed to
Consider the following code, used with Fibers 1.0.0 as packaged in Guix System:
(use-modules (fibers)
(fibers conditions))
(define some-fluid (make-thread-local-fluid 1234567))
(format #t "Main thread initial value: ~A~%" (fluid-ref some-fluid))
(fluid-set! some-fluid 1337)
(join-thread
(call-with-new-thread
(lambda ()
(format #t "Guile thread value: ~A~%" (fluid-ref some-fluid)))))
(run-fibers
(lambda ()
(let ((condition (make-condition)))
(spawn-fiber (lambda ()
(format #t "Fiber 1 value: ~A~%" (fluid-ref some-fluid))
(fluid-set! some-fluid 42)
(signal-condition! condition)))
(spawn-fiber (lambda ()
(wait condition)
(format #t "Fiber 2 value: ~A~%" (fluid-ref some-fluid))))))
#:drain? #t)
(format #t "Main thread final value: ~A~%" (fluid-ref some-fluid))
It produces this output:
Main thread initial value: #f
Guile thread value: #f
Fiber 1 value: 1337
Fiber 2 value: 42
Main thread final value: 42
There are a number of things going wrong here.
- The default value of some-fluid, set when it is created, seems to be ignored in favor of
#f. This is a bug in guile I've reported as bug 36915. - In the fiber, the value of
some-fluidis inherited. This is the opposite of what might be expected to happen based on section 6.22.2 of the Guile manual, but is in line with section 2.1 of the Fibers manual, where it says "The fiber will inherit the fluid-value associations (the dynamic state) in place when 'spawn-fiber' is called". It would be nice to have the ability to use thread-local variables in fibers, though. - When the value of
some-fluidis set from within the fiber, the change takes effect outside of that fiber, which is the opposite of what might be expected to happen based on section 2.1 of the Fibers manual. There it says: "Any 'fluid-set!' or parameter set within the fiber will not affect fluid or parameter bindings outside the fiber".
After a bit of reading Guile's source, it looks like the intent behind thread-local fluids is that they can't be directly manipulated by any guile-side code - that way there's no risk of values being visible in other threads. It also means that there's currently no way of making this particular feature make sense in Fibers. It works with guile-native threads because the setup for them calls guilify_self_2, which sets up the new thread's dynamic_state with an empty set of thread_local_values, and there's no need to ever switch that around because it's part of SCM_I_CURRENT_THREAD, which is different per-thread.
In future versions of Guile it may be possible to make it work, and that'd be nice. So I'll leave this issue open for now, if that's okay.
I still have no clue what's up with issue 1, though.
Issue 1 has been solved in the latest version of (I'm using Guile 3.0.2).
make-thread-local-fluid function seems to be a remnant of 2005-03-08 rewrite by Marius Vollmer.
It's retained in Guile 2.2 release, there was a make-dynamic-state function before. NEWS for 2.2 release:
** `make-dynamic-state'
Use `current-dynamic-state' to get an immutable copy of the current fluid-value associations.
So in this code snippet make-thread-local-fluid should be replaced with make-fluid.
Works correctly with make-fluid in current Guile version, 3.0.5:
Guile thread value: 1337
Fiber 1 value: 1337
Fiber 2 value: 1337
Main thread final value: 1337
Value 42 of some-fluid is visible only in dynamic extent of the first fiber.
2. It would be nice to have the ability to use thread-local variables in fibers, though.
Fiber can be on any random thread when #:parallel? #t argument for spawn-fiber is used, or after work stealing in schedulers. Maybe thread locals could be useful for some interoperability with C libraries.
Maybe we could have separate make-thread-local-fluid and make-fiber-local-fluid? I never use thread local fluids though, so I don't know whether this would be useful.
i think i'm looking at a manifestation of this, or something similar.
it's in Shepherd, the init process of Guix. it's a single threaded program that uses fibers for cooperative multitasking. it has a
(define current-process-monitor
;; Channel to communicate with the process monitoring fiber.
(make-parameter #f))
the symptom is that even though the entire daemon is wrapped in the dynamic extent of a parameterize that sets this fluid, at some point an assert dies because this fluid is #f.
is my expectation valid, namely that if there are no posix threads, and everything happens under the dynamic extent of a parameterize, then i should see the same fluid value throughout my entire codebase? assuming, of course, that nothing sets the parameter besides the initial binding.
what seems to trigger this is the reloading of the shepherd config, which creates a temp module, loads some code into it, and evals it. this code spawns fibers, and some figers are exiting, but all under the parameterize.
$ guile --version
guile (GNU Guile) 3.0.9
name: guile-fibers
version: 1.3.1
also pinging @civodul for this.
@attila-lendvai That's not a thread-local fluid (in this case, parameter), but that does sound vaguely like something that could be caused by Fibers useof 'with-dynamic-state/current-dynamic-state'. (Still sounds weird though, will think a little more about it -- as you describe the situation, even if there were posix threads in play, that seems impossible, so I think the actual situation is different.)
I think the bug here is that 'spawn-fiber' does all this with-dynamic-state/current-dynamic-state stuff and I think that stuff should simply be removed (after checking why Fibers does this ‘isolation of fluid and parameter mutations’, to use the docstring's wording, to make sure the cure doesn't cause a new isssue/reintroduce an old issue).
If you want to spawn a fiber with isolated state, you can still do so yourself with little trouble, but AFAIK it is impossible to de-isolate a state ...
@attila-lendvai here is some code testing the situation you are describing ... it doesn't reproduce it, I propose minimizing the test case.
(use-modules (fibers) (fibers timers))
(define p (make-parameter #false))
(parameterize ((p #true))
(run-fibers
(lambda ()
(pk (p))
(spawn-fiber (lambda () (pk (p))))
(sleep 1))
#:parallelism 1 #:hz 0)
Hypothesis: shepherd installs a signal handler, which in turn tries to invoke the process monitor. But the signal was sent to a thread that isn't the main thread (created before the 'parameterize', or even created inside but outside Guile's (ice-9 threads)/SRFI equivalent so Guile doesn't remember where the new thread originated from so the parameter/fluid bindings aren't inherited from the 'parameterize'), and because that other thread doesn't have the binding of your process-monitor parameter, resulting in the issues you noticed.
You can test this by (pk (current-thread)) in the main thread, inside the (run-fibers ...) form (should be redundant, but maybe test it just in case), and just before reading the process-monitor parameter.
If the output isn't the same in all cases, then something like the hypothesis might be going on.
the symptom is that even though the entire daemon is wrapped in the dynamic extent of a
parameterizethat sets this fluid,
So, I looked at shepherd.scm (shepherd 0.10.2), and this turns out to be untrue:
(run-fibers
(lambda ()
(with-service-registry
;; Register and start the 'root' service.
(register-services (list root-service))
(start-service root-service)
(with-process-monitor
;; [...]
(replace-core-bindings! [...]
(run-daemon [...])))))
As you can see, the `run-fibers' is not wrapped in a process monitor. Hence, assuming that 'the process monitor is read/invoked from a signal handler, it is quite plausible that it is invoked outside the extend of the with-process-monitor.
Before the run-fibers (not copied), there is also a whole lot of initialisation going on, which might perhaps be capturing the (uninitialised) process monitor.
Even inside the run-fibers, not everything is wrapped in a process monitor: the register-services and start-service isn't.
(Yes, run-daemon is inside the process monitor, but that's a misleading name, the other stuff is also daemon stuff!).
Looking at the definition of handle-SIGCHLD, the first paragraph appears to be the case. So, seems like a pure Shepherd bug.
Also, performing Fibers operations inside an async is super skeevy. The Fibers stuff might be in an intermediate state, which usually is invisible to Fibers users but potentially not from asyncs, so now it might be messing things up. Or the async might be run inside a fiber that currently is in the progress of doing a put-message of its own, which put-message was not designed to handle, so that might mess things up. Or perhaps it is even run inside the process monitor, while it is doing (get-message (current-process-monitor)) -- I don't know what would happen then, but I doubt it is anything good.
Really, unless it's a pure operation or something ultra-basic like vector-set! or something that's actually documented to be allowed/supported, you shouldn't be assuming you can simply do things from an async.
Please remove this process-monitor stuff or actually implement & document the prerequisites in Fibers first.
Given how Shepherd is messing around with the API, I'll be responding to future Shepherd/Fibers bug reports that are not easily verifiably to be an actual bug in Fibers (*) with ‘then don't misuse the API’ + close.
(*) a small reproducer (not using Shepherd) should do it
@emixa-d thank you very much for looking into this!
i'll move the rest of this discussion to the guix mailing list.
(To be clear, I think that being able to do 'put-message' from an async would be great, the issue is that this is not yet actually supported & implemented -- it might work but more by accident than anything else.)
@emixa-d is there any facility through which an async signal handler could tell some fiber in the fiber universe to wake up? the handler could package up the signal into an object, store it somewhere, and tell a fiber to look at this place.
and if there's nothing like this yet, then do you think polling an atomic variable could work?
i'm rather new to scheme/guile/fibers, so feel free to just give me an RTFM pointer... :)
@emixa-d is there any facility through which an async signal handler could tell some fiber in the fiber universe to wake up? the handler could package up the signal into an object, store it somewhere, and tell a fiber to look at this place.
I don't think any of this is in the manual.
and if there's nothing like this yet, then do you think polling an atomic variable could work?
Make sure not to poll too long because shepherd disables preemption (maybe insert some '(sleep 0)'? Such spinning grossly inefficient, but in principle it would would work to work-around things ... but inefficient. I guess you could avoid the inefficiency by only polling when something else happens anyway (say, the daemon receives some RPC, a service has started, whatever), sounds kind of unsatisfying though.
Perhaps, with some small adjustments, signal-condition (+ an atomic box for the data)? The implementation of signal-condition (or was it perform-operation itself?) has some support for being run outside a fiber (via POSIX condition stuff) -- while I don't think it expects to be run as an async (e.g., signal handler stuff), I guess it would be straightforward to define a variant signal-condition/no-fibers that always does the POSIX path.
It might even be the case that signal-condition works as-is -- compared to the other Fibers operations stuff, it is quite straightforward -- but that requires a careful look, testing (+ documenting for discoverability and making sure that for future changes, people know not to break that).
(Well, signal-condition!, whatever.)
Alt., do the pipe trick.
Though I think the main issue here is that the parameterize doesn't cover the root-service, register-services and with-service-register stuff. (The async stuff is problematic, but appears to be something with a low chance of actually going wrong.) iow,you may need to switch with-service-registry with with-process-monitor.
@emixa-d wow, indeed! i never thought switching them up would make a difference. i only moved the root-service stuff inside the with-process-monitor, but it behaved the same.
thanks for keeping on thinking about this, and proposing an actual fix! :)
Op 17-12-2023 om 00:57 schreef Attila Lendvai:
@emixa-d https://github.com/emixa-d wow, indeed! i never thought switching them up would make a difference. i only moved the |root-service| stuff inside the |with-process-monitor|, but it behaved the same: https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/asserts https://codeberg.org/attila-lendvai-patches/shepherd/commits/branch/asserts
thanks for keeping on thinking about this, and proposing an actual fix! :)
Warning: I'm not familiar with the implementation of with-process-monitor -- while I think part of the problem is that they need to be switched, it might be possible that with-process-monitor makes use of the service registry, in which case with-service-registry and with-process-monitor may need to be combined in a single unified thing or something.
@emixa-d no, your gut feeling was right. since then the fix is already in shepherd: https://git.savannah.gnu.org/cgit/shepherd.git/commit/?id=9be0b7e6fbe3c2e743b5626f4dff7c7bf9becc16
thanks again!