JHH11 comments

Repositories
Issues
Comments

Results 3 comments of


                                            JHH11

Empty output when running Q4_K_M quantization of Llama-3-8B-Instruct with llama-cpp-python

I encountered a similar issue. After fine-tuning a LLM and quantizing it using `llama.cpp`, the model works perfectly when accessed via the terminal using `llama-cli`. However, when I attempt to...

Empty output when running Q4_K_M quantization of Llama-3-8B-Instruct with llama-cpp-python

@smolraccoon Thanks for sharing, but the method didn't work for me. By the way, should the `chat_format` be set to `llama-3`?

ERROR: Could not build wheels for llama-cpp-python

Thank you @blkqi. Your advice really helped me. In my case, I used Dockerfile like that ``` dockerfile ENV LD_LIBRARY_PATH=/usr/local/cuda-12.4/compat/libcuda.so RUN CMAKE_ARGS="-DGGML_CUDA=on" pip install llama-cpp-python==0.2.90 ```