binghan1227 comments

Results 1 comments of


                                            binghan1227

试试在`weclone/core/inference/vllm_infer.py` 的 `engine_args` 里加一条 `"gpu_memory_utilization": 0.95,`