h2ogpt cpu: No Errors But No Answer

what are the min req for cpu ? I ran this on i7 32g ram but ran for over 20min with no answer / then killed it used the .env default models as noted in readme anytricks to get this up and working on cpu only?

Jun 06 '23 15:06 quantumalchemy

Hi @quantumalchemy . I recommend llama one. It depends upon how much you fill the context for how long it takes to run. In readme, at end of the CPU section: https://github.com/h2oai/h2ogpt/blob/main/README.md#cpu, we recommend what to try when system is slower or have less memory.

I have 64GB ram and i9 and the llama models run fine, but the more context you fill them with the longer they will take.

Jun 06 '23 16:06 pseudotensor

Yeah I ran the default model_path_llama=WizardLM-7B-uncensored.ggmlv3.q8_0.bin

No getting answer.. will try again Thanks

Jun 06 '23 17:06 quantumalchemy

Ok. Yes, that's our default case that's part of our test suite too.

Are you using CLI or UI?

Jun 06 '23 18:06 pseudotensor

The ui .. I will try on a gpu tomorrow thanks

Jun 07 '23 03:06 quantumalchemy

Also windows or linux?

The CLI way is described here: https://github.com/h2oai/h2ogpt#cli-chat

Jun 07 '23 05:06 pseudotensor

ok using cli (chat) worked ok using ggml-gpt4all-j-v1.3-groovy.bin but not llama WizardLM-7B-uncensored.ggmlv3.q8_0.bin tried --> (q4) Wizard-Vicuna-7B-Uncensored.ggmlv3.q4_1.bin in chat only and that worked but wont work in GUI when I created db for a pdf - then python generate.py --base_model=llama --cli=True --langchain_mode=UserData error: requested results 1000 is greater than number of elements in index 266, updating n_results = 266 llama_tokenize: too many tokens -- I will try other models .. but I would like an uncensored -- any suggestions? seems it only works on q4 models

Jun 07 '23 14:06 quantumalchemy

Streaming in UI for CPU models isn't supported yet. Perhaps you didn't wait long enough?

error: requested results 1000 is greater than number of elements in index 266, updating n_results = 266

not actually an error, you can ignore

llama_tokenize: too many tokens

I see, probably large input and even tokenizer can't handle beyond some limit of input text. Will review.

Jun 07 '23 20:06 pseudotensor

Jun 08 '23 07:06 pseudotensor

h2ogpt h2ogpt copied to clipboard

cpu: No Errors But No Answer

h2ogpt
h2ogpt copied to clipboard