llm
llm copied to clipboard
Deterministic generations
Given the same seed and prompt, the same text should be generated. This will require us to implement a deterministic PRNG (instead of using thread_rng), and to allow specifying a seed. This should also assist in benchmarking.
A lofty goal!
Be aware that under the hood llama (and indeed most ANNs) use floating-point, and floating-point determinism is a rabbit hole with no bottom.
Some particular issues that come to mind:
- Subnormal handling (can be flushed to zero, or not).
- Extended precision (intermediate values can be evaluated in higher precision, or not).
- This can be done by the compiler, or in libraries.
- E.g. it seems like ggml in some cases uses f32 for f16 evaluation
- GCC and clang can force non-extended precision for floating-point ops... but only for named variables, not temporaries. I haven't seen any equivalent to do even that for Rust.
- Transcendental functions in general (can vary slightly in different implementations).
- E.g. ggml uses the host tanh / etc.
- FPU mode bits in general (e.g. rounding modes).
- This is threadlocal state, and can be affected by things like 'what shared libraries are injected'.
- Famously, at one point there was a printer driver that was clobbering FPU state - so if you opened a file picker in Windows your FPU results would then be different within that thread. Lovely.
- Ordering within summations and other reductions
- Conversion between floats and integers (and vice versa)
- Ties into rounding modes, above.
See also e.g. https://github.com/rust-lang/unsafe-code-guidelines/issues/237.
All told: it's doable, with a fair bit of effort, and has been done before (look up lockstep networking for games - much the same issue). Just be aware that it's not a trivial task, especially if you demand determinism between different machines, not just between different compiles on the same machine.
I would say, for the time being, determinism given the same hardware and compiler version is a good enough goal. Going beyond that and trying to make things deterministic across different kinds of hardware is probably going to negatively affect performance.
Aye, agreed with setzer - the primary thing I want is to be able to specify the same parameters on the same machine and get the same results. We can think about offering a "fully deterministic" mode later, but as you've mentioned madness lies that way.
I think this can be closed since the rest of the work to get determinism is out-of-scope for now :+1: