dblacknc
dblacknc
I am using Warp in K8s mode with 22 clients. A mix test with total ~250M 4KB objects, or about 11.3M per client, appeared to normally finish the prepare phase...
The vicuna-13b-int4 model is running very well on my RTX 3060. Out of curiosity I added --cpu to try and run there for performance comparison. On the first prompt, a...
### Describe the bug I expect when adding --verbose to the server.py args, prompts and responses would be printed after being issued, and preferably not duplicated but I get the...
### Describe the bug Git pull as of this morning. When I change interface options in the UI and click restart, it seems the web browser immediately does a refresh...
### Describe the bug This is related to #1636 - trying to work around VRAM usage on my 12 GB RTX 3060, and using the 4bit model, trying --gpu-memory 7...
### Describe the bug I'm trying to use the llava extension with my 12 GB RTX3060 card. It's working reasonably well, and I notice the VRAM usage idles at about...
### Describe the bug At times an OpenAssistant model will seemingly prompt and reply to itself, after answering a basic question. It's definitely in Open Assistant mode and I see...
**Description** When using an RWKV model, the loading strategy must be expressed on the command line, and is often model size-specific. It also can be a relatively complex argument. Add...