charleswg comments

Results 15 comments of


                                            charleswg

Model Prediction and Model loading

Actually I was trying to load the trained model to predict the hazard ratio of incoming data. But I took a step back and tried in-session prediction right after training....

Gloo timeout when training on multi-GPU configurations

2 N A5000 cards have no issues, both GPU memory and usage used.

Eval bug: very slow inference on DeepSeek-R1-Distill-Qwen-32B

Same here as it's not offloading to GPU for some reason: load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/65 layers to GPU load_tensors: CPU_Mapped model buffer size =...

Eval bug: very slow inference on DeepSeek-R1-Distill-Qwen-32B

I actually able to run it. I ran the newest llama.cpp executables that supports it.

Feature Request: allow setting jinja chat template from server webui

I concur. I've been testing how to use llama-server with latest compiled exe but never get the tool calling working. Was wondering if I have to compile it from source.