Paul Hoskinson comments

Results 11 comments of


                                            Paul Hoskinson

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set)

> I just installed using this method, setup.py didn't work for me [#177 (comment)](https://github.com/oobabooga/text-generation-webui/issues/177#issuecomment-1464844721) its pre-assembled That may work for Windows but my issue is in Linux

Llama 4-bit install instructions no longer work (CUDA_HOME environment variable is not set)

See comment here for possible workaround: https://github.com/qwopqwop200/GPTQ-for-LLaMa/issues/59#issuecomment-1475041809

Differentiate results

I ran into this issue while building a basic redux + feathers application. For now I'm using the base feathers client combined with redux and redux-saga.

Hows the inference speed and mem usage?

Speeds on an old 4c/8t intel i7 with above prompt/seed: 7B, n=128 **t=4 165 ms/token** t=5 220 ms/token t=6 188 ms/token t=7 168 ms/token **t=8 154 ms/token** 13B **t=4 314...

Hows the inference speed and mem usage?

> This might be a dumb question but is there any way to reduce the memory requirements even if it increases inference time? Currently no, other than adding a lot...

faster performance on older machines

I'm getting the same reuslts on a 4c/8t i7 skylake on linux (7B model, 4-bit). -t 4 is several times faster than -t 8

faster performance on older machines

Upon further testing it seems like if I have anything else using the CPU e.g. having Firefox open and watching a video, -t 8 slows to a crawl while -t...

faster performance on older machines

> BTW can you think of any way to make the GPU help out? It isn't doing anything at the moment This project is CPU only however there's a different...

Lonnnnnnnnng context load time before generation

In the case of llama.cpp, when a long prompt is given you can see it output the provided prompt word by word at a slow rate even before it starts...

Error when installing cuda kernel

> The following seems to work for me: > > ```shell > # ... as before > cd GPTQ-for-LLaMa > pip install -r requirements.txt > # Add the following line:...