Create Troubleshooting guide that lists errors and causes
This kind of thing https://github.com/omcljs/om/wiki/Troubleshooting
Fits in with this: https://github.com/onyx-platform/onyx/issues/269
Why do retries happen? Brief explanation of retries and then discussion on how to solve them:
- back pressure by reducing max-pending
- increase pending timeout
- onyx/fns that do IO. It may be better to do a whole batch via lifecycles or create and output plugin
- may want to increase batch sizes especially for output plugins
- look at metrics / profile and schedule more peers to tasks that are taking the longest
Closed publications, display example exception Generally caused by GC, laptop sleeping, via timeout Mention config to increase timeout
Aeron message too large exception, show Java properties you can set.
Running out of disk space as a result of media driver in docker. Probably a /dev/shm issue see https://gist.github.com/bsima/e5f5952350259d66209d and https://github.com/onyx-platform/onyx-template/blob/0.8.x/resources/leiningen/new/onyx_app/run-peers.sh#L7
Onyx log grew too large: https://gist.github.com/lbradstreet/732b1b5e99cbf68b37dc
Starvation issues: increase Aeron timeouts, switch to shared media driver, g1gc
- My program won't get off the ground at all.
- [x]
java.lang.IllegalStateException: aeron cnc file version not understood - [x]
java.io.IOException: No space left on device- [x] Are you running in Docker?
- [x] start docker run with a bigger
-—shm-size
- [x] start docker run with a bigger
- [x] Are you running in Docker?
- [x]
Failed to connect to the Media Driver - is it currently running? - [x]
No implementation of method: :read-char of protocol: #'clojure.tools.reader.reader-types/Reader found for class- [x] Does your
:onyx/fnreturn something that isn't a map, or a vector of maps? Are all of the elements in those maps EDN serializable?
- [x] Does your
- [x]
org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie- [x] deleting
/tmp/bookkeeper_journaland/tmp/bookkeeper_ledger
- [x] deleting
- [x]
- [x] My program started, but isn't behaving how I expected it would.
- [x] Work never started
- [x] Too few peers
- [x] Did you pick the right scheduler?
- [x] Peers say they're "warming up"
- [x] Do you have a start-lifecycle? call that isn't returning true?
- [x] I'm not seeing any messages being processed
- [x] Which plugin are you using for input?
- [x] Kafka
- [x] Are you starting the Onyx peer at an appropriate offset?
- [x] Use :onyx/fn to print debug all messages coming in
- [x] Kafka
- [x] Which plugin are you using for input?
- [x] Messages are being replayed multiple times
- [x] Did you set pending-timeout to something appropriate?
- [x] My program starts running, but then stalls
- [x] Did you check onyx.log for any exceptions?
- [x] Are you using core.async for output?
- [x] Is the output channel full?
- [x] Work never started
Landed a bunch of this on develop. See https://github.com/onyx-platform/onyx/blob/e4f432364c9ce3c7b75271f04c8bbbf21fb487d4/doc/user-guide/faq.md.