fenixlam comments

Results 8 comments of


                                            fenixlam

How to increase the threads in inference?

update: Ok I test it with 12threads in both GPT4All() and LlamaCppEmbeddings. Its speed has huge increase from 527 seconds to 216 seconds. I see there is GPT4All() and RetrievalQA.from_chain_type()....

how to reduce memory to 16 GPU

I test the application and it require 32GB vram to run it, so it is best to use runpod or other GPU service provided server to run the program.

Can anyone help on this hnswlib issue? Three different people have mentioned this. I am number four. Thanks.

My anaconda python 3.10.9 does not have this problem... so... that could be a workaround?

Best parameters according to Alpaca itself. And it's perfect!

I think the most important is... how you find these parameters?? I have tested it to generate a text paragraph and it Looks good!

Best parameters according to Alpaca itself. And it's perfect!

@wal58 For 13B model, you can just download the 13B model and load it as 7B model by parameter > ./chat -m [your 13B model]. I remember 13B model's base...

output length

try to use ./chat -n 4096 ?

Improve ALPACA speed using GPU

add a parameter to allow the user to choose to use GPU or CPU... even the directml XD would be the best choice.

Where has the link to the 13b model gone?

https://huggingface.co/Pi3141/alpaca-lora-13B-ggml/tree/main ?