Abhay Saxena comments

Results 40 comments of


                                            Abhay Saxena

RPM for centos?

The Telepresence documentation has a [list of dependencies](https://www.telepresence.io/reference/install.html#dependencies), but it does not explain how to install these dependencies on common platforms. We could use your help to fix this! Can...

Docker Compose support

Excellent! Roughly speaking, you've implemented somewhat of a variant of option 1 by hand. I agree that your approach should work in many cases. I am going to experiment some...

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

Not sure whether you're ready for feedback on this, but I'm _very_ excited for this feature. ``` llama-server --model ${gguf}/DeepSeek-V3.1-Terminus-UD-Q4_K_XL-00001-of-00008.gguf --alias deepseek/deepseek-v3.1-terminus --jinja -fa on --reasoning-budget 0 --reasoning-format deepseek --fit-ctx...

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

> does it work without `--cache-ram`? No. Same error, down to the number: `failed to allocate CUDA1 buffer of size 11693719552`

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

Same command as before, except without `--cache-ram`, yields ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9,...

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

Log file: [ark3-fit.log](https://github.com/user-attachments/files/23196819/ark3-fit.log)

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

It worked! VRAM usage is pretty good too, 89% and 96% once I added `-ub 4096 -b 8192`, without which PP was unbearably slow. More importantly, I was able to...

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

Can everything your `--fit` code determines be expressed as a set of `-ot` options and the like? If so, would it be possible to have a separate utility that does...

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

Great! In case the motivation is not obvious, that would allow an alternative if there is resistance to adding this feature due to the new startup time cost.

llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization

The output from `llama-fit-params` appears to match what `llama-server` does. The result this time is slightly worse on VRAM usage (87% and 97% by `nvtop`) than last time, but still...