charleswg

Results 15 comments of charleswg

Actually I was trying to load the trained model to predict the hazard ratio of incoming data. But I took a step back and tried in-session prediction right after training....

2 N A5000 cards have no issues, both GPU memory and usage used.

Same here as it's not offloading to GPU for some reason: load_tensors: offloading 0 repeating layers to GPU load_tensors: offloaded 0/65 layers to GPU load_tensors: CPU_Mapped model buffer size =...

I actually able to run it. I ran the newest llama.cpp executables that supports it.

I concur. I've been testing how to use llama-server with latest compiled exe but never get the tool calling working. Was wondering if I have to compile it from source.