onyx icon indicating copy to clipboard operation
onyx copied to clipboard

Create Troubleshooting guide that lists errors and causes

Open lbradstreet opened this issue 10 years ago • 9 comments

This kind of thing https://github.com/omcljs/om/wiki/Troubleshooting

lbradstreet avatar Nov 26 '15 10:11 lbradstreet

Fits in with this: https://github.com/onyx-platform/onyx/issues/269

lbradstreet avatar Nov 26 '15 10:11 lbradstreet

Why do retries happen? Brief explanation of retries and then discussion on how to solve them:

  • back pressure by reducing max-pending
  • increase pending timeout
  • onyx/fns that do IO. It may be better to do a whole batch via lifecycles or create and output plugin
  • may want to increase batch sizes especially for output plugins
  • look at metrics / profile and schedule more peers to tasks that are taking the longest

lbradstreet avatar Nov 26 '15 14:11 lbradstreet

Closed publications, display example exception Generally caused by GC, laptop sleeping, via timeout Mention config to increase timeout

lbradstreet avatar Nov 26 '15 14:11 lbradstreet

Aeron message too large exception, show Java properties you can set.

lbradstreet avatar Nov 26 '15 14:11 lbradstreet

Running out of disk space as a result of media driver in docker. Probably a /dev/shm issue see https://gist.github.com/bsima/e5f5952350259d66209d and https://github.com/onyx-platform/onyx-template/blob/0.8.x/resources/leiningen/new/onyx_app/run-peers.sh#L7

lbradstreet avatar Dec 03 '15 15:12 lbradstreet

Onyx log grew too large: https://gist.github.com/lbradstreet/732b1b5e99cbf68b37dc

lbradstreet avatar Dec 04 '15 16:12 lbradstreet

Starvation issues: increase Aeron timeouts, switch to shared media driver, g1gc

lbradstreet avatar Jan 20 '16 15:01 lbradstreet

  • My program won't get off the ground at all.
    • [x] java.lang.IllegalStateException: aeron cnc file version not understood
    • [x] java.io.IOException: No space left on device
      • [x] Are you running in Docker?
        • [x] start docker run with a bigger -—shm-size
    • [x] Failed to connect to the Media Driver - is it currently running?
    • [x] No implementation of method: :read-char of protocol: #'clojure.tools.reader.reader-types/Reader found for class
      • [x] Does your :onyx/fn return something that isn't a map, or a vector of maps? Are all of the elements in those maps EDN serializable?
    • [x] org.apache.bookkeeper.bookie.BookieException$InvalidCookieException: Cookie
      • [x] deleting /tmp/bookkeeper_journal and /tmp/bookkeeper_ledger
  • [x] My program started, but isn't behaving how I expected it would.
    • [x] Work never started
      • [x] Too few peers
      • [x] Did you pick the right scheduler?
    • [x] Peers say they're "warming up"
      • [x] Do you have a start-lifecycle? call that isn't returning true?
    • [x] I'm not seeing any messages being processed
      • [x] Which plugin are you using for input?
        • [x] Kafka
          • [x] Are you starting the Onyx peer at an appropriate offset?
          • [x] Use :onyx/fn to print debug all messages coming in
    • [x] Messages are being replayed multiple times
      • [x] Did you set pending-timeout to something appropriate?
    • [x] My program starts running, but then stalls
      • [x] Did you check onyx.log for any exceptions?
      • [x] Are you using core.async for output?
        • [x] Is the output channel full?

MichaelDrogalis avatar Jun 14 '16 14:06 MichaelDrogalis

Landed a bunch of this on develop. See https://github.com/onyx-platform/onyx/blob/e4f432364c9ce3c7b75271f04c8bbbf21fb487d4/doc/user-guide/faq.md.

MichaelDrogalis avatar Jun 17 '16 01:06 MichaelDrogalis