Abhay Saxena
Abhay Saxena
The Telepresence documentation has a [list of dependencies](https://www.telepresence.io/reference/install.html#dependencies), but it does not explain how to install these dependencies on common platforms. We could use your help to fix this! Can...
Excellent! Roughly speaking, you've implemented somewhat of a variant of option 1 by hand. I agree that your approach should work in many cases. I am going to experiment some...
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
Not sure whether you're ready for feedback on this, but I'm _very_ excited for this feature. ``` llama-server --model ${gguf}/DeepSeek-V3.1-Terminus-UD-Q4_K_XL-00001-of-00008.gguf --alias deepseek/deepseek-v3.1-terminus --jinja -fa on --reasoning-budget 0 --reasoning-format deepseek --fit-ctx...
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
> does it work without `--cache-ram`? No. Same error, down to the number: `failed to allocate CUDA1 buffer of size 11693719552`
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
Same command as before, except without `--cache-ram`, yields ``` ggml_cuda_init: GGML_CUDA_FORCE_MMQ: no ggml_cuda_init: GGML_CUDA_FORCE_CUBLAS: no ggml_cuda_init: found 2 CUDA devices: Device 0: NVIDIA RTX 6000 Ada Generation, compute capability 8.9,...
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
Log file: [ark3-fit.log](https://github.com/user-attachments/files/23196819/ark3-fit.log)
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
It worked! VRAM usage is pretty good too, 89% and 96% once I added `-ub 4096 -b 8192`, without which PP was unbearably slow. More importantly, I was able to...
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
Can everything your `--fit` code determines be expressed as a set of `-ot` options and the like? If so, would it be possible to have a separate utility that does...
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
Great! In case the motivation is not obvious, that would allow an alternative if there is resistance to adding this feature due to the new startup time cost.
llama: automatically set parameters not set by the user in such a way that maximizes GPU utilization
The output from `llama-fit-params` appears to match what `llama-server` does. The result this time is slightly worse on VRAM usage (87% and 97% by `nvtop`) than last time, but still...