llm Deterministic generations

Given the same seed and prompt, the same text should be generated. This will require us to implement a deterministic PRNG (instead of using thread_rng), and to allow specifying a seed. This should also assist in benchmarking.

Mar 16 '23 18:03 philpax

A lofty goal!

Be aware that under the hood llama (and indeed most ANNs) use floating-point, and floating-point determinism is a rabbit hole with no bottom.

Some particular issues that come to mind:

Subnormal handling (can be flushed to zero, or not).
Extended precision (intermediate values can be evaluated in higher precision, or not).
- This can be done by the compiler, or in libraries.
- E.g. it seems like ggml in some cases uses f32 for f16 evaluation
- GCC and clang can force non-extended precision for floating-point ops... but only for named variables, not temporaries. I haven't seen any equivalent to do even that for Rust.
Transcendental functions in general (can vary slightly in different implementations).
- E.g. ggml uses the host tanh / etc.
FPU mode bits in general (e.g. rounding modes).
- This is threadlocal state, and can be affected by things like 'what shared libraries are injected'.
- Famously, at one point there was a printer driver that was clobbering FPU state - so if you opened a file picker in Windows your FPU results would then be different within that thread. Lovely.
Ordering within summations and other reductions
- E.g. it seems like 4-wide and 8-wide SIMD implementations in ggml use different summation orderings.
- E.g. it seems like dot product uses a different summation ordering for SIMD / non-SIMD.
Conversion between floats and integers (and vice versa)
- Ties into rounding modes, above.

See also e.g. https://github.com/rust-lang/unsafe-code-guidelines/issues/237.

All told: it's doable, with a fair bit of effort, and has been done before (look up lockstep networking for games - much the same issue). Just be aware that it's not a trivial task, especially if you demand determinism between different machines, not just between different compiles on the same machine.

Mar 17 '23 01:03 nonnull-ca

I would say, for the time being, determinism given the same hardware and compiler version is a good enough goal. Going beyond that and trying to make things deterministic across different kinds of hardware is probably going to negatively affect performance.

Mar 17 '23 11:03 setzer22

Aye, agreed with setzer - the primary thing I want is to be able to specify the same parameters on the same machine and get the same results. We can think about offering a "fully deterministic" mode later, but as you've mentioned madness lies that way.

Mar 17 '23 12:03 philpax

I think this can be closed since the rest of the work to get determinism is out-of-scope for now :+1:

Mar 20 '23 20:03 setzer22

llm llm copied to clipboard

Deterministic generations

llm
llm copied to clipboard