Paul Hoskinson

Results 11 comments of Paul Hoskinson

> I just installed using this method, setup.py didn't work for me [#177 (comment)](https://github.com/oobabooga/text-generation-webui/issues/177#issuecomment-1464844721) its pre-assembled That may work for Windows but my issue is in Linux

See comment here for possible workaround: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issuecomment-1475041809

I ran into this issue while building a basic redux + feathers application. For now I'm using the base feathers client combined with redux and redux-saga.

Speeds on an old 4c/8t intel i7 with above prompt/seed: 7B, n=128 **t=4 165 ms/token** t=5 220 ms/token t=6 188 ms/token t=7 168 ms/token **t=8 154 ms/token** 13B **t=4 314...

> This might be a dumb question but is there any way to reduce the memory requirements even if it increases inference time? Currently no, other than adding a lot...

I'm getting the same reuslts on a 4c/8t i7 skylake on linux (7B model, 4-bit). -t 4 is several times faster than -t 8

Upon further testing it seems like if I have anything else using the CPU e.g. having Firefox open and watching a video, -t 8 slows to a crawl while -t...

> BTW can you think of any way to make the GPU help out? It isn't doing anything at the moment This project is CPU only however there's a different...

In the case of llama.cpp, when a long prompt is given you can see it output the provided prompt word by word at a slow rate even before it starts...

> The following seems to work for me: > > ```shell > # ... as before > cd GPTQ-for-LLaMa > pip install -r requirements.txt > # Add the following line:...