onyx Style guide

Per some Gitter discussion, a script of when to use certain features when you can use more than one would be helpful.

Aug 12 '15 19:08 MichaelDrogalis

Style guide and cookbook

Aug 20 '15 17:08 lbradstreet

Onyx don't: submit only one segment to a job and let it multiply many orders of magnitude within the job. You give up fault tolerance and max-pending's natural backpressure completely with this approach.

Sep 11 '15 19:09 MichaelDrogalis

Onyx do: track retry counts via metrics. If retries are becoming too high, you can adjust your max-pending on inputs, or increase the pending timeout.

Sep 11 '15 19:09 lbradstreet

FAQ: too many retries (otherwise known as why are some things coming out of order?!?) Max pending may be too high. Make sure you're batching adequately (writers may be falling behind), or the general overhead of small batches may be too large

Sep 18 '15 15:09 lbradstreet

FAQ: why is my complete latency so high? Maybe you have too many intermediate tasks, or they each generate too many intermediate segments Maybe your throughput isn't high enough Your batch timeout may be too high (if you're not high volume you may be hitting the batch timeout before emitting) You may want to reduce the pending-timeout if it's your retries that turning around too slowly (be careful you don't start a retry storm)

Sep 18 '15 16:09 lbradstreet

FAQ: why do I X A: get metrics setup

Sep 18 '15 16:09 lbradstreet

Q: how do I filter out segments A: flow conditions or onyx/fn that returns empty vector

Sep 18 '15 16:09 lbradstreet

Q: how do I add extra behaviour to my tasks outside of onyx/fn? A: have a look at the lifecycle docs (link)

Sep 18 '15 16:09 lbradstreet

Q: how should I benchmark on a single machine A: definitely turn off messaging short circuiting (link). Only do this for the benchmarking.

Sep 18 '15 16:09 lbradstreet

Q: some performance question A: https://github.com/onyx-platform/onyx/blob/0.7.x/doc/user-guide/performance-tuning.md

(possibly should revise with some of these answers)

Sep 18 '15 16:09 lbradstreet

Q: how do I ensure that after I kill a job and start a new one, that it picks back up? A: look into the check pointing features of your given plugin Related Q: how do I do rolling deploys A: insert best practices here.

Sep 18 '15 16:09 lbradstreet

Onyx dos: 3-4 smaller nodes > one bigger node, fault tolerance wise

Sep 21 '15 15:09 lbradstreet

Onyx do: see if you can rationalise how many tasks you have, especially if they merely feed in to each other. You should generally have one vpeer per core and too many tasks may mean you need to oversubscribe your cores. Plus, it adds extra latency and serialisation overhead, and can cause extra retries

Sep 21 '15 15:09 lbradstreet

Something about confusion about virtual peers starting up and how they're allocated.

Oct 27 '15 15:10 lbradstreet

Retries (metrics). Look at batch latency for your tasks. If any of them is some significant proportion of your pending-timeout, then something is wrong. Optimise that task or increase pending-timeout.

Nov 23 '15 15:11 lbradstreet

Increasing batch size helps increase throughput for plugins, or users of lifecycles that use the whole batch. Increasing batch size with slow function calls on segments can hurt you because they reduce your chance of acking your segment before pending-timeout is over.

Nov 23 '15 15:11 lbradstreet