fibers icon indicating copy to clipboard operation
fibers copied to clipboard

Memory leak on choice operation of 'get' and 'sleep'

Open civodul opened this issue 1 year ago • 1 comments

The code below exhibits a memory leak with Fibers 1.3.1:

(use-modules (fibers)
             (fibers channels)
             (fibers operations)
             (fibers timers)
             (ice-9 textual-ports)
             (ice-9 match)
             (rnrs bytevectors))

(include "heap-profiler.scm")
(sigaction SIGINT
  (lambda _
    (profile-heap)
    (primitive-exit 0)))

(setvbuf (current-output-port) 'none)
(run-fibers
 (lambda ()
   (define channel
     (make-channel))

   (spawn-fiber
    (lambda ()
      (let loop ()
        (put-message channel 'hello)
        (sleep 0.0001)
        (loop))))

   (spawn-fiber
    (lambda ()
      (let loop ()
        (format #t "~%heap: ~a MiB~%"
                (round
                 (/ (assoc-ref (gc-stats) 'heap-size) (expt 2. 20))))
        (sleep 1)
        (loop))))

   (let loop ()
     (perform-operation
      (choice-operation (wrap-operation (sleep-operation 100)
                                        (const 'timeout))
                        (get-operation channel)))
     (loop)))
 #:hz 0
 #:parallelism 1)

(The heap-profiler.scm file comes from here.)

A typical run looks like this:

$ guile fibers-leak.scm 
WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'

heap: 4.0 MiB

heap: 7.0 MiB

heap: 9.0 MiB

heap: 12.0 MiB

heap: 16.0 MiB

heap: 16.0 MiB

heap: 22.0 MiB

heap: 21.0 MiB

heap: 22.0 MiB

heap: 29.0 MiB

heap: 29.0 MiB

heap: 29.0 MiB

heap: 37.0 MiB

heap: 37.0 MiB

heap: 37.0 MiB
  C-c C-c
;;; (#<procedure heap-sections ()>)

;;; (sampling 100000)
  %   type                               self    avg obj size
 54.4 program                             1,565,456    32.1
 19.8 pair                                  571,072    16.0
  8.0 unknown                               229,216  1776.9
  7.0 struct                                201,056    51.6
  2.5 partial-continuation                   70,816    32.0
  2.0 pointer                                56,560    16.0
  1.8 vector                                 52,363    50.8
  0.8 stringbuf                              23,744    68.2
  0.8 vm-continuation                        23,152    17.3
  0.7 symbol                                 21,408    32.0
  0.7 atomic-box                             20,640    17.2
  0.4 bytevector                             11,120   358.7
  0.2 variable                                6,608    20.5
  0.2 string                                  6,080    32.0
  0.2 smob                                    5,344    32.0
  0.2 heap-number                             4,496    30.2
  0.1 hash-table                              4,256    32.0
  0.1 weak-table                              3,600    28.8
  0.0 primitive                                 768    16.0
  0.0 weak-vector                               384    19.2
  0.0 bitvector                                 224    37.3
  0.0 keyword                                   192    16.0
  0.0 dynamic-state                             176    16.0
  0.0 primitive-generic                          96    32.0
  0.0 fluid                                      48    16.0
  0.0 port                                       32    32.0
sampled heap: 2.74554 MiB (heap size: 36.64453 MiB)

This suggests delimited continuations are being accumuated somewhere. It's all fine when commenting out the sleep operation.

civodul avatar Sep 08 '24 08:09 civodul

This is because the "block" function of timer-operation calls schedule-task-at-time, adding a thunk to the scheduler's queue, and there's nothing that undoes this when the operation "loses" in a choice operation.

Eventually that thunk gets scheduled and removed from the run queue, but that might take too long and cause the memory leak we're seeing.

In the example above, changing from (sleep-operation 100) to (sleep-operation 1) makes a big difference: heap size quickly stabilizes because those timers are removed every second, which is fast enough.

It seems to me that there needs to be a way to evacuate those sleep-operation timers explicitly when a sleep operation is "canceled", to avoid that problem, but operations don't have a "cancel" function. Thoughts? Cc: @wingo

civodul avatar Sep 08 '24 09:09 civodul

(Assigning the ... assignment based on what's happening)

emixa-d avatar Oct 30 '24 19:10 emixa-d

This is because the "block" function of timer-operation calls schedule-task-at-time, adding a thunk to the scheduler's queue, and there's nothing that undoes this when the operation "loses" in a choice operation.

It seems to me that there needs to be a way to evacuate those sleep-operation timers explicitly when a sleep operation is "canceled", to avoid that problem, but operations don't have a "cancel" function. Thoughts? Cc: @wingo

This is what the unique-to-Concurrent ML basic combinator withNack (spelled nack-guard-evt in Racket) is supposed to provide!

See https://github.com/wingo/fibers/issues/97 and Reppy, Concurrent Programming in ML § 6.5.

LiberalArtist avatar Oct 30 '24 20:10 LiberalArtist