Anton Lokhmotov
Anton Lokhmotov
> This was discussed in the inference WG when Nvidia proposed the equal issue mode for GPTJ amd LLAMA2. Is there anything in the rules on this? > But for...
Agree. We've had this triggered even when only file permissions changed (e.g. `g+w`).
Or is it the same file stored under [configs/default_config.yaml](https://github.com/mlcommons/training/blob/master/llama2_70b_lora/configs/default_config.yaml)?
> when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system I agree how Offline may be...
Another scenario to consider: having done a Preview submission with an old Available server (e.g. v5) equipped with new Preview accelerators, a submitter may want to do an Available submission...
**N.B:** We are aware of an open [vLLM issue](https://github.com/vllm-project/vllm/pull/12802), due to which setting temperature to zero still results in non-determinism. Maybe we will need to recalibrate reference accuracy for the...
Inference WG 18/Feb/2025: multiple parties run the reference implementation and obtained identical results. Maybe it's a by-product of `topk=1`. Optimized submissions should use the same parameters.
> ## [Does a temperature of 0 result in non-determinism?](https://www.vellum.ai/llm-parameters/temperature) > A common case of confusion is if a temperature of 0 generates non-deterministic replies. In theory, yes. In practice,...
It's like [flammable vs inflammable](https://www.merriam-webster.com/grammar/flammable-or-inflammable).
clang-format has many [style options](https://clang.llvm.org/docs/ClangFormatStyleOptions.html) and even predefined options (LLVM, Google, Chromium, Mozilla, WebKit, Microsoft). Any thoughts on which one we should adopt? As a recycled compiler person :), I'd...