fibers
fibers copied to clipboard
#:drain? #t does not wait until all fibers have completed.
No threads, no operations, only run-fibers with #:drain? #t and spawn-fiber. FWIW, the bug doesn't manifest when disabling preemption (#:hz 0). Anyway, here is a test case waiting to succeed:
;;;; Copyright (C) 2021 Maxime Devos
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
;;;; version 3 of the License, or (at your option) any later version.
;;;;
;;;; This library is distributed in the hope that it will be useful,
;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
;;;; Lesser General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU Lesser General Public
;;;; License along with this library; if not, write to the Free Software
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
(use-modules (ice-9 control)
(fibers)
(srfi srfi-43)
(srfi srfi-64)
(rnrs bytevectors))
(define do-nothing-but-dont-tell-guile
(eval '(lambda _ (values)) (current-module)))
(define N_ITERATIONS 20000)
;; (* 2 (current-processor-count)) on the system
;; the test was written on (and failing), in case
;; that matters.
(define N_THREAD 8)
(define (thread thread-index)
"Do things, make garbage and return #t."
(pk 'fiber-start thread-index)
(do ((i 0 (+ 1 i)))
((>= i N_ITERATIONS))
(do-nothing-but-dont-tell-guile thread-index i)
;; Curiously, if I remove the following expression,
;; then the bug consistently does not manifest. Some connection
;; to garbage management perhaps?
;;
;; Using (cons thread-index i) also manifests the bug,
;; but (subjectively) less frequently.
(do-nothing-but-dont-tell-guile
(make-bytevector 600)))
;; If the test fails and you look at the output,
;; notice that not all (fiber-start INDEX) are matched
;; by a (fiber-end INDEX).
(pk 'fiber-end thread-index)
#t)
(define (check-all-done per-thread)
(define (bad? a)
(not (eq? a #t)))
(define bad (vector-index bad? per-thread))
(if bad
(begin (pk 'bad!! bad (vector-ref per-thread bad))
#f)
#t))
(test-assert "all fibers are xrun to completion"
(let ((thread-indices (iota N_THREAD))
;; When a thread has completed, an element
;; of this vector is set to #t.
(thread-done? (make-vector (+ 1 N_THREAD))))
(run-fibers
(lambda ()
(define (run! thread-index)
(spawn-fiber
(lambda ()
(vector-set! thread-done? thread-index
(thread thread-index)))))
(for-each run! thread-indices))
#:drain? #t
;; No need
#:install-suspendable-ports? #f
#:hz 700)
(vector-set! thread-done? N_THREAD #t)
(check-all-done thread-done?)))
Pinging @amirouche since someone on IRC suggested so. Your N_THREAD and N_ITERATIONS may vary. Also, this test case occasionally succeeds.
Oh but it does not rely on POSIX threads? Then I do not know.
This is for version 1.0.0 btw. Didn't test on master.
I can reproduce on master (44d17e64e581fb281bae1390f74c762ecf089b58) (GNU Guile 2.2.7) as well.
My suspicion:
When creating a scheduler with parallelism (make-scheduler #:parallelism N)
, it created a few other schedulers internally, which do not share a workqueue.
For testing whether the fibers have finished (see scheduler-work-pending & run-fibers), scheduler-work-pending? is used. But that procedure only looks at the workqueues of only one scheduler, and all other schedulers (co-schedulers?) are ignored!
Seems like we need to call scheduler-work-pending? on all schedulers in run-fibers.