h2ogpt
h2ogpt copied to clipboard
cpu: No Errors But No Answer
what are the min req for cpu ? I ran this on i7 32g ram but ran for over 20min with no answer / then killed it used the .env default models as noted in readme anytricks to get this up and working on cpu only?
Hi @quantumalchemy . I recommend llama one. It depends upon how much you fill the context for how long it takes to run. In readme, at end of the CPU section: https://github.com/h2oai/h2ogpt/blob/main/README.md#cpu, we recommend what to try when system is slower or have less memory.
I have 64GB ram and i9 and the llama models run fine, but the more context you fill them with the longer they will take.
Yeah I ran the default model_path_llama=WizardLM-7B-uncensored.ggmlv3.q8_0.bin
No getting answer.. will try again Thanks
Ok. Yes, that's our default case that's part of our test suite too.
Are you using CLI or UI?
The ui .. I will try on a gpu tomorrow thanks
Also windows or linux?
The CLI way is described here: https://github.com/h2oai/h2ogpt#cli-chat
ok using cli (chat) worked ok using ggml-gpt4all-j-v1.3-groovy.bin but not llama WizardLM-7B-uncensored.ggmlv3.q8_0.bin tried --> (q4) Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin in chat only and that worked but wont work in GUI when I created db for a pdf - then python generate.py --base_model=llama --cli=True --langchain_mode=UserData error: requested results 1000 is greater than number of elements in index 266, updating n_results = 266 llama_tokenize: too many tokens -- I will try other models .. but I would like an uncensored -- any suggestions? seems it only works on q4 models
Streaming in UI for CPU models isn't supported yet. Perhaps you didn't wait long enough?
error: requested results 1000 is greater than number of elements in index 266, updating n_results = 266
not actually an error, you can ignore
llama_tokenize: too many tokens
I see, probably large input and even tokenizer can't handle beyond some limit of input text. Will review.