LaaZa comments

Results 113 comments of


                                            LaaZa

Using API yields completely different answers than with the WebUI

I think the issue is simply that the example does not use instruct format specific to Vicuna and you have possibly different sampling parameters and the stopping criteria is not...

Using API yields completely different answers than with the WebUI

The prompt should match the instruction template when you advance in question rounds. End the prompt with the assistants turn like ### Assistant: so it knows to answer as itself...

error while trying to start it

If you only have 4 GB of VRAM you are never going to be able to load a 13B model onto GPU. Look into trying GGML models.

error while trying to start it

You don't. It depends on your GPU. But you can try GGML models since they run on the cpu and use system RAM. Not going to be fast though.

error while trying to start it

13B LLaMA model will not fit 24 GB of vram. You need to load it in either 8bit with load_in_8bit or use a GPTQ quantized model which are usually 4bit.

error while trying to start it

Just set the `--load-in-8bit` flag or check that option in webui when you load the model. For GPTQ quantization you could use [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ) or [GPTQ-for-LLaMa](https://github.com/qwopqwop200/GPTQ-for-LLaMa). Currently textgen uses the latter...