Yihua Cheng
Yihua Cheng
Good question. I thought vLLM is using a docker image for testing (but their unit tests takes forever to run..) Maybe another solution is to pre-install all the needed non-python...
This should be a bug. We will try fixing this soon.
@YaoJiayi I found there is something hard coded here: https://github.com/LMCache/LMCache/blob/e9cb5189a82329877402c11c14728e4c0df1afa1/lmcache/integration/vllm/vllm_v1_adapter.py#L403 Is this the core reason?
@cotol7 NIXL depends on UCX and directly do `pip install nixl` does not include UCX installation. You can use NIXL or dynamo's docker image, or cuda-dl-base docker image, or compile...
We are also working on providing a Docker file that includes a correct installation of NIXL. Should be in #578
Thanks @IRONICBo , looking forward to your contribution!
I'm traveling these days. Will come back to this PR after this Wednesday.
Fixed the crash problem. Now the lm_eval runs with the following output on llama-3.1-8B model: ``` |Tasks|Version| Filter |n-shot| Metric | |Value | |Stderr| |-----|------:|----------------|-----:|-----------|---|-----:|---|-----:| |gsm8k| 3|flexible-extract| 5|exact_match|↑ |0.7933|± |0.0234|...
@njhill @robertgshaw2-redhat Now the crashing & hanging issue should be fixed.