Tuan T. Pham
Tuan T. Pham
@g-ramirez : Thanks for sharing this. I use this patch with other changes from [`packer-windows`][0] to build a MaaS image of Windows 2019 HyperV. However, I notice that the script...
This might not be the right solution but...here is a patch for that. https://github.com/neofob/chatbot-rnn/commit/1f56cb941b834c5bc95c8f40fe58ce08277f4d10
@carlitosmanuelitos : No luck atm yet.
@carlitosmanuelitos : According to [Meta webpage][0], the prompt looks like this ``` Source: system System prompt Source: user First user query Source: assistant Model response to first query Source: user...
The current agony: * https://www.reddit.com/r/LocalLLaMA/comments/1afweyw/quick_headsup_about_using_codellama_70b_and/
@eode : You should open a PR for this.
@a-b-n-e-o : For P6000 its FP16 performance is lower comparing to modern GPUs. So forcing it to use FP32 is a better option. Adding to your `CMAKE_ARGS='-DLLAMA_CUBLAS=on -DLLAMA_CUDA_FORCE_MMQ=on'` References: *...
@thanhtantran : * By default, privategpt offloads all layers to GPU. In your case, all 33 layers are offloaded. You can adjust that number in the file [`llm_component.py:45`][0] * Running...
@charlyjna : Multi-GPU crashes on "Query Docs" mode for me as well. It works in "LLM Chat" mode though. I have a RTX 4000 Ada SSF and a P40.
> > @charlyjna : Multi-GPU crashes on "Query Docs" mode for me as well. It works in "LLM Chat" mode though. I have a RTX 4000 Ada SSF and a...