nim-chronos icon indicating copy to clipboard operation
nim-chronos copied to clipboard

index error in readStreamLoop of transports/stream

Open michaelsbradleyjr opened this issue 2 years ago • 3 comments

I'm experiencing runtime crashes from unhandled exceptions re: nim-chronos, with L1436 in transports/stream.nim reported as the culprit:

/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/repos/nim-dagger/experiments/localstore.nim(114) main
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/transports/stream.nim(1436) readStreamLoop
/Users/michael/.choosenim/toolchains/nim-1.6.0/lib/system/fatal.nim(53) sysFatal
[[reraised from:
/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
]]
Error: unhandled exception: index 4096 not in 0 .. 4095 [IndexDefect]

I've tried with Nim v1.2.14, v1.4.8, v1.6.0 — same thing regardless of version.

When I (naively) change L1435 from:

while true:

to:

while transp.offset < transp.buffer.len:

I no longer experience that runtime crash.

Context

I'm doing some early stage experiments combining use of task_runner impl_beta_2 branch's threadpool with chronos' http server and async pipes:

https://github.com/michaelsbradleyjr/nim-dagger/blob/localstore/experiments/localstore.nim#L77-L78

Re: those two lines in particular, I only experience the runtime crash (i.e. without change mentioned above) if L77 is enabled and L78 is disabled, i.e. only when the Future returned by writer.write is not discarded.


I can create separate issues re: the following, but this issue feels like a decent place to start the discussion/s...

(1) When composing async streams, is the way I did it in experiments/localstore.nim, via readMessage and a predicate, the way to go about it? Or is there another API for that purpose? I spent awhile looking through the tests and implementations of nim-chronos, but I may have missed something or a lot of things.

(2) Is there a reason why ReadMessagePredicate of transports/stream.nim does not return a Future? If it did, then my L77-L78 linked above could be replaced with:

asyncSpawn writer.write(@data, data.len)

And the index error would get re-raised as a Defect per asyncSpawn.

(3) When the crash reported by this issue is "fixed", or ignored by discarding the Future returned by writer.write, I'm experiencing a similar looking error when I try to increase the number of concurrent large-ish uploads to the running server (~7 MB per POST request) from a couple of dozen to several dozen:

/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/repos/nim-dagger/experiments/localstore.nim(114) main
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/apps/http/httpserver.nim(748) processLoop
/Users/michael/repos/nim-dagger/experiments/localstore.nim(68) process
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/transports/stream.nim(2516) fromPipe
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(560) register
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/ioselects/ioselectors_kqueue.nim(221) registerHandle
/Users/michael/.choosenim/toolchains/nim-1.6.0/lib/system/fatal.nim(53) sysFatal
[[reraised from:
/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/repos/nim-dagger/experiments/localstore.nim(114) main
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/apps/http/httpserver.nim(748) processLoop
]]
[[reraised from:
/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/repos/nim-dagger/experiments/localstore.nim(114) main
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/.choosenim/toolchains/nim-1.6.0/lib/system/excpt.nim(144) processLoop
]]
[[reraised from:
/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
/Users/michael/repos/nim-dagger/experiments/localstore.nim(114) main
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
]]
[[reraised from:
/Users/michael/repos/nim-dagger/experiments/localstore.nim(117) localstore
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(1091) waitFor
/Users/michael/.nimble/pkgs/chronos-#export-selector-field/chronos/asyncloop.nim(279) poll
]]
Error: unhandled exception: index -1 not in 0 .. 49151 [IndexDefect]

I've tried with Nim v1.2.14, v1.4.8, v1.6.0 — same thing regardless of version. With v1.6.0, if I add --gc:orc I get the same crash but the stacktrace is missing, I only get the last line about index -1 ... and it's sometimes garbled with other characters.

This one surely deserves its own bug report, but the reason I mention it here is that it has a (maybe superficial) resemblance to the problem reported at the top of this issue, i.e. involves another loop processor and index error.

michaelsbradleyjr avatar Nov 22 '21 21:11 michaelsbradleyjr

Oooh, this is a bug of the kind that's been with us for quite a while - I'm not sure were never able to track it down in a satisfactory way - do you have a stable repro?

arnetheduck avatar Nov 23 '21 07:11 arnetheduck

do you have a stable repro?

Yes. I'll give the steps to repro, and if it seems too hacky then I'll try to make a branch where the repro is more self-contained, i.e. less need to do prep work.

  1. Clone my fork of nim-dagger and switch to the localstore branch: https://github.com/michaelsbradleyjr/nim-dagger/tree/localstore.

  2. Make or find some several+ MB file that you'll be uploading to the experimental web server + task_runner. And you'll want a script like this:

upload.sh

#!/usr/bin/env bash

filename="$(basename -- "$1")"
extension="${filename##*.}"
filename="${filename%.*}"

for x in {01..24}; do
  curl -i -d @$1 -H "Content-Type: application/json" -X POST \
    localhost:30080/upload/$filename$x.$extension &
done
  1. In experiments/localstore.nim, enable L77 and disable L78; and disable L112.

  2. Run mkdir -p experiments/files && nimble localstore.

  3. After (4) has invoked the built experiments/localstore binary, in another terminal run upload.sh [your file]

Shortly after invoking upload.sh, in the terminal running nimble localstore you should see localstore crash, and the output will include:

Error: unhandled exception: index 4096 not in 0 .. 4095 [IndexDefect]

michaelsbradleyjr avatar Nov 23 '21 18:11 michaelsbradleyjr

@michaelsbradleyjr I think we finally found the root cause: https://github.com/status-im/nim-chronos/pull/267 - can you run your test again?

arnetheduck avatar Apr 11 '22 09:04 arnetheduck