Ze Wang comments

Repositories
Issues
Comments

Results 2 comments of


                                            Ze Wang

Error: Warmup(Generation("Not enough memory to handle 1024 prefill tokens. You need to decrease `--max-batch-prefill-tokens`")

docker run --gpus '"device=0,1,2,3"' \ --shm-size 1g \ -p 8081:80 \ -v /home/unionlab001/Model/qwen-72b:/data ghcr.io/predibase/lorax:latest \ --model-id /data/Qwen1_5-72B-Chat \ --trust-remote-code \ --quantize bitsandbytes-nf4 \ --max-batch-prefill-tokens 300 \ --max-input-length 200 \ --max-total-tokens...

[BUG] vllm0.3.3加速Qwen1.5系列模型，报ImportError: cannot import name 'model_schema' from 'pydantic.schema'

@xqxls 请问这个问题解决了吗