Tuan T. Pham comments

Results 32 comments of


                                            Tuan T. Pham

Windows templates

@g-ramirez : Thanks for sharing this. I use this patch with other changes from [`packer-windows`][0] to build a MaaS image of Windows 2019 HyperV. However, I notice that the script...

Issue with train.py - chatset errors.

This might not be the right solution but...here is a patch for that. https://github.com/neofob/chatbot-rnn/commit/1f56cb941b834c5bc95c8f40fe58ce08277f4d10

Add support for CodeLlama prompt

@carlitosmanuelitos : No luck atm yet.

Add support for CodeLlama prompt

@carlitosmanuelitos : According to [Meta webpage][0], the prompt looks like this ``` Source: system System prompt Source: user First user query Source: assistant Model response to first query Source: user...

Add support for CodeLlama prompt

The current agony: * https://www.reddit.com/r/LocalLLaMA/comments/1afweyw/quick_headsup_about_using_codellama_70b_and/

[Testers Needed] Using non blocking I/O

@eode : You should open a PR for this.

llama.cpp with mistral-7b-instruct-v0.2.Q5_K_M.gguf performance comparison between Intel CPU, nVIDIA GPU and Apple M1/M2

@a-b-n-e-o : For P6000 its FP16 performance is lower comparing to modern GPUs. So forcing it to use FP32 is a better option. Adding to your `CMAKE_ARGS='-DLLAMA_CUBLAS=on -DLLAMA_CUDA_FORCE_MMQ=on'` References: *...

Increase layers offloading to GPU and GPU buffer size

@thanhtantran : * By default, privategpt offloads all layers to GPU. In your case, all 33 layers are offloaded. You can adjust that number in the file [`llm_component.py:45`][0] * Running...

Multi GPU for large model

@charlyjna : Multi-GPU crashes on "Query Docs" mode for me as well. It works in "LLM Chat" mode though. I have a RTX 4000 Ada SSF and a P40.

Multi GPU for large model

> > @charlyjna : Multi-GPU crashes on "Query Docs" mode for me as well. It works in "LLM Chat" mode though. I have a RTX 4000 Ada SSF and a...