fibers icon indicating copy to clipboard operation
fibers copied to clipboard

#:drain? #t does not wait until all fibers have completed.

Open emixa-d opened this issue 3 years ago • 4 comments

No threads, no operations, only run-fibers with #:drain? #t and spawn-fiber. FWIW, the bug doesn't manifest when disabling preemption (#:hz 0). Anyway, here is a test case waiting to succeed:

;;;; Copyright (C) 2021 Maxime Devos
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
;;;; version 3 of the License, or (at your option) any later version.
;;;;
;;;; This library is distributed in the hope that it will be useful,
;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;;;; Lesser General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU Lesser General Public
;;;; License along with this library; if not, write to the Free Software
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

(use-modules (ice-9 control)
             (fibers)
             (srfi srfi-43)
             (srfi srfi-64)
             (rnrs bytevectors))

(define do-nothing-but-dont-tell-guile
  (eval '(lambda _ (values)) (current-module)))

(define N_ITERATIONS 20000)
;; (* 2 (current-processor-count)) on the system
;; the test was written on (and failing), in case
;; that matters.
(define N_THREAD 8)

(define (thread thread-index)
  "Do things, make garbage and return #t."
  (pk 'fiber-start thread-index)
  (do ((i 0 (+ 1 i)))
      ((>= i N_ITERATIONS))
    (do-nothing-but-dont-tell-guile thread-index i)
    ;; Curiously, if I remove the following expression,
    ;; then the bug consistently does not manifest. Some connection
    ;; to garbage management perhaps?
    ;;
    ;; Using (cons thread-index i) also manifests the bug,
    ;; but (subjectively) less frequently.
    (do-nothing-but-dont-tell-guile
     (make-bytevector 600)))
  ;; If the test fails and you look at the output,
  ;; notice that not all (fiber-start INDEX) are matched
  ;; by a (fiber-end INDEX).
  (pk 'fiber-end thread-index)
  #t)

(define (check-all-done per-thread)
  (define (bad? a)
    (not (eq? a #t)))
  (define bad (vector-index bad? per-thread))
  (if bad
      (begin (pk 'bad!! bad (vector-ref per-thread bad))
             #f)
      #t))

(test-assert "all fibers are xrun to completion"
  (let ((thread-indices (iota N_THREAD))
        ;; When a thread has completed, an element
        ;; of this vector is set to #t.
        (thread-done? (make-vector (+ 1 N_THREAD))))
    (run-fibers
     (lambda ()
       (define (run! thread-index)
         (spawn-fiber
          (lambda ()
            (vector-set! thread-done? thread-index
                         (thread thread-index)))))
       (for-each run! thread-indices))
     #:drain? #t
     ;; No need
     #:install-suspendable-ports? #f
     #:hz 700)
    (vector-set! thread-done? N_THREAD #t)
    (check-all-done thread-done?)))

Pinging @amirouche since someone on IRC suggested so. Your N_THREAD and N_ITERATIONS may vary. Also, this test case occasionally succeeds.

emixa-d avatar Jun 03 '21 19:06 emixa-d

Oh but it does not rely on POSIX threads? Then I do not know.

amirouche avatar Jun 04 '21 11:06 amirouche

This is for version 1.0.0 btw. Didn't test on master.

emixa-d avatar Jun 04 '21 16:06 emixa-d

I can reproduce on master (44d17e64e581fb281bae1390f74c762ecf089b58) (GNU Guile 2.2.7) as well.

emixa-d avatar Jun 04 '21 16:06 emixa-d

My suspicion:

When creating a scheduler with parallelism (make-scheduler #:parallelism N), it created a few other schedulers internally, which do not share a workqueue.

For testing whether the fibers have finished (see scheduler-work-pending & run-fibers), scheduler-work-pending? is used. But that procedure only looks at the workqueues of only one scheduler, and all other schedulers (co-schedulers?) are ignored!

Seems like we need to call scheduler-work-pending? on all schedulers in run-fibers.

emixa-d avatar Jun 04 '21 18:06 emixa-d