dblacknc

Results 20 comments of dblacknc

The bug report is response N is printed in the verbose log only after prompt N+1 is entered. Sorry for any confusion from my also mentioning the entire history is...

If I run "netstat -an | fgrep :80" and watch for the TIME_WAIT connections to go away (no output), it'll then start. It has been a very long time since...

Looks like with --gpu-memory 7100MiB it starts pushing some layers to cpu. I'm thinking part of the challenge is with the llava extension active the GPU already has 1.8-2.0 GB...

The line is 7100 - pushes a few layers to the CPU, and 7200 doesn't. However with 7200 (and above) it overruns the 12 GB VRAM with many prompts.

OK - confirmed, --pre_layer is allowing CPU offload to work with GPTQ. I found a couple other related things: --auto-devices seems to be unconditionally enabled. I can omit it and...

OK - thanks for the explanation, and pointer to the README for LLaVA for more info. Closing as looks like it's not a bug, and I'll assume for now the...

this model: wojtab_llava-13b-v0-4bit-128g

I reported the same in issue #1632

I'm just trying RWKV and it's working well for me. Not running in a container though. I'm using an Ubuntu 22.04 KVM VM with 64 GB RAM and passing through...