Bellk17
Bellk17
I would second this. My use-case is IO-bound, so much simpler if there was a way to update the concurrency vs spinning up new processes.
Also having same issue. Bouncing service seems to fix it, but redis ping comes back fine, so my health check is giving a false positive.
I'm seeing the same issue. Catching the import error gives: `No module named 'vllm._C'` Also seeing warnings during install: ``` ... CMake Warning at /home/tensorwave/install_vllm/venv/lib/python3.10/site-packages/torch/share/cmake/Torch/TorchConfig.cmake:22 (message): static library kineto_LIBRARY-NOTFOUND not...
Found fix; for me it was an issue for running benchmark test from the source directory. During installation, the `_C` module is compiled into the site-packages directory of the pip...
I would also love to test this out; we are looking for a no/low code solution to add into our frontend.
Seeing same behavior. Torch is able to concurrently save and load quite efficiently, but safetensor grows in linear time with concurrency.
Same Issue