okwinds
okwinds
mac一样用不了,配置http://host.docker.internal:11434/v1 or http://127.0.0.1:11434/v1 or http://localhost:11434/v1都不行
the same exception under load (running 32B-W4A16) 4090 RTX --dtype auto \ --gpu-memory-utilization 0.95 \ --block-size 16 \ --max-model-len 5200 \ --kv-cache-dtype auto \ --max-num-batched-tokens 5200 \ --max-seq-len-to-capture 5200 \...
> @ashgold thanks for pointing this, here is a PR to fix it: #8417感谢您指出这一点,这里有一个 PR 来修复它: #8417 I also expanded the preemption test so it actually does the log stats...
This feature is already supported and can be closed