Anton Lokhmotov comments

Results 273 comments of


                                            Anton Lokhmotov

Setting `min_query_count` for GPTJ

> This was discussed in the inference WG when Nvidia proposed the equal issue mode for GPTJ amd LLAMA2. Is there anything in the rules on this? > But for...

Loadgen built with uncommitted changes - needs revisit

Agree. We've had this triggered even when only file permissions changed (e.g. `g+w`).

Accelerate config file missing for LoRA

Or is it the same file stored under [configs/default_config.yaml](https://github.com/mlcommons/training/blob/master/llama2_70b_lora/configs/default_config.yaml)?

Allow for larger scale submissions in Inference when moving from Preview to Available

> when a single model is split across multiple GPUs and hence the performance per accelerator might be better on a larger scale system I agree how Offline may be...

Allow for larger scale submissions in Inference when moving from Preview to Available

Another scenario to consider: having done a Preview submission with an old Available server (e.g. v5) equipped with new Preview accelerators, a submitter may want to do an Available submission...

Llama3.1-405B non-zero temperature

**N.B:** We are aware of an open [vLLM issue](https://github.com/vllm-project/vllm/pull/12802), due to which setting temperature to zero still results in non-determinism. Maybe we will need to recalibrate reference accuracy for the...

Llama3.1-405B non-zero temperature

Inference WG 18/Feb/2025: multiple parties run the reference implementation and obtained identical results. Maybe it's a by-product of `topk=1`. Optimized submissions should use the same parameters.

Llama3.1-405B non-zero temperature

> ## [Does a temperature of 0 result in non-determinism?](https://www.vellum.ai/llm-parameters/temperature) > A common case of confusion is if a temperature of 0 generates non-deterministic replies. In theory, yes. In practice,...

Llama3.1-405B non-zero temperature

It's like [flammable vs inflammable](https://www.merriam-webster.com/grammar/flammable-or-inflammable).

Proposal: run clang-format and autopep8 on the existing code base; potentially add pre-commit hook for code style checking

clang-format has many [style options](https://clang.llvm.org/docs/ClangFormatStyleOptions.html) and even predefined options (LLVM, Google, Chromium, Mozilla, WebKit, Microsoft). Any thoughts on which one we should adopt? As a recycled compiler person :), I'd...