ChezScheme icon indicating copy to clipboard operation
ChezScheme copied to clipboard

`set-cdr!` can cause `list?` to crash in safe mode

Open LiberalArtist opened this issue 2 years ago • 3 comments

This program (by @mflatt, from https://github.com/racket/racket/issues/4072#issuecomment-988000472) creates an engine that calls list? on a pair which is initially a list. While the engine is suspended, it uses set-cdr! to mutate one of the interior pairs so that it is no longer a list. If the interruption comes at just the right moment, resuming the engine will lead list? to perform an invalid memory reference: the program iteratively finds an amount of fuel that will reveal the error.

(define N 10000)

(define ml (let loop ([i N])
             (if (zero? i)
                 '()
                 (cons i (loop (sub1 i))))))
(define mid (list-tail ml (floor (* N 1/4))))
(define mid-rest (cdr mid))

(define Q 100)

(let find-crash ([j 0])
  (define e (make-engine (lambda () (list? ml))))
  (printf "~s\n" j)
  (set-cdr! mid mid-rest)       
  (e (floor (* (/ j Q) N))
     (lambda (t v) (printf "done! ~s\n" v))
     (lambda (e2)
       (set-cdr! mid 0)
       (e2 N (lambda (t v) (printf "~s\n" v)) (lambda (e2) "still not done"))
       (unless (= j Q)
         (find-crash (add1 j))))))

Example output:

Chez Scheme Version 9.5.4
Copyright 1984-2020 Cisco Systems, Inc.

0
#f
1
#f
2
#f
3
#f
4
#f
5
#f
6
#f
7
#f
8
#f
9
#f
10
#f
11
#f
12
#f
13
Exception: invalid memory reference.  Some debugging context lost

LiberalArtist avatar Dec 08 '21 21:12 LiberalArtist

Thanks for the interesting bug report!

The implementation of list? is not safe in the presence of concurrent modification (via threads or engines). Since it's part of the Chez standard libraries, it's compiled with optimize-level 3, so it really crashes when something goes wrong. If it were compiled at 02 or below then it would still error, but with Exception in cdr: 0 is not a pair.

A fix for this particular case is simple: add a check for (pair? tortoise) to the and just before calling (cdr tortoise). That's obviously less efficient in the common case, but necessary if the function is supposed to be thread or engine -safe. The same tortoise/hare idiom to detect circular lists is used several other places, and one could reasonably assume they all suffer from the same deficiency.

I think maintainers probably need a little discussion to decide whether these functions are intended to be correct in the presence of concurrent modification and if so to identify other functions with the same issue.

jltaylor-us avatar Dec 09 '21 01:12 jltaylor-us

I'd also point to some other nuances highlighted by @gus-massa in https://github.com/racket/racket/issues/4072#issuecomment-986142876 (mcdr, mpair?, and mlist are Racket's spellings for cdr, pair?, and list with mutable pairs.):

I agree that checking (mpair? tortoise) would be nice. Anyway, some hard questions:

If the user replace a mcdr between the tortoise and the hare with ...

... null, then the result should be #t or #f?

... 7l, then the result should be #t or #f?

... a cyclic pseudo-mlist and the original object is another cyclic pseudo-mlist, then the check never ends. Is this fixable? Do we care?

LiberalArtist avatar Dec 09 '21 02:12 LiberalArtist

So far our general consensus is that there is no well-defined result possible in the face of concurrent modification. (My first proposed solution does have the distinction of being definitely bad, in that it could answer #f for a list whose only edit was to change a cell's tail from pointing to one proper list to another. A better solution would be to replace tortoise with hare if it suddenly became a non-pair.)

The only real well-defined behavior I've been able to come up with is to raise an exception if we happen to detect concurrent modification. I'll let you guess what the reaction was when I asked what sort of performance impact it would have to start each call to list? with a call1/cc.

So far it's leaning towards not changing the behavior, but other maintainers could still weigh in with different opinions.

jltaylor-us avatar Dec 09 '21 02:12 jltaylor-us

NB: In SRFI 226, I propose a condition &concurrent-modification so that an implementation can raise an exception of this type when it detects a concurrent modification.

mnieper avatar Nov 04 '22 11:11 mnieper

已收到

Sshanfeng avatar Nov 04 '22 11:11 Sshanfeng

After returning to this issue, I'd like to advocate in favor of the change that @jltaylor-us describes.

I don't see how list operations in general can be well-defined if the list is modified concurrently — at least, not without something dramatically more complex/expensive, like tracking all reads and writes. The solution described above takes care to return #t if the list is extended in a particular way while the list? predicate is in flight, but it would still possible to expose the tortoise—hare internals by carefully moving a pair from the early in the list to later, so that the hare and tortoise are the same and #f is returned, even if all the intermediate states are lists.

So, if we agree that list operations are undefined in the presence of mutation, then list? is free to return a random boolean in the event of mutation. It's nice when safe mode can detect and report an error for undefined behavior, but that's not always so easy — as in this case, since allowing an exception interferes with optimizations — and so it's not always done. But avoiding memory faults is more of a priority.

mflatt avatar Dec 19 '22 06:12 mflatt

The changes that @jltaylor-us described were made as part of the v9.9.9 merge.

mflatt avatar Nov 21 '23 22:11 mflatt