JHH11
JHH11
I encountered a similar issue. After fine-tuning a LLM and quantizing it using `llama.cpp`, the model works perfectly when accessed via the terminal using `llama-cli`. However, when I attempt to...
@smolraccoon Thanks for sharing, but the method didn't work for me. By the way, should the `chat_format` be set to `llama-3`?
Thank you @blkqi. Your advice really helped me. In my case, I used Dockerfile like that ``` dockerfile ENV LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat/libcuda.so RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.2.90 ```