JaheimLee

Results 59 comments of JaheimLee

> Yeah. I have multiple CUDA package. And I manually set CUDA_HOME to cuda-11.7 as shown above both in my .bashrc and your build_pytorch_blade.sh. Why it still uses cuda 11.0?

Here is the output ``` (base) lijinghui@idc-op-dev-gpu-001:/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib$ ldd libtorch_cuda.so linux-vdso.so.1 (0x00007fff38354000) libc10_cuda.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10_cuda.so (0x00007f0247b7e000) libcudart-e409450e.so.11.0 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libcudart-e409450e.so.11.0 (0x00007f020c7dc000) libnvToolsExt-847d78f2.so.1 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libnvToolsExt-847d78f2.so.1 (0x00007f020c5d1000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f020c3b2000) libc10.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10.so...

> Here is the output > > ``` > (base) lijinghui@idc-op-dev-gpu-001:/data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib$ ldd libtorch_cuda.so > linux-vdso.so.1 (0x00007fff38354000) > libc10_cuda.so => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libc10_cuda.so (0x00007f0247b7e000) > libcudart-e409450e.so.11.0 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libcudart-e409450e.so.11.0 (0x00007f020c7dc000) > libnvToolsExt-847d78f2.so.1 => /data/miniconda3/envs/ljh_BladeDISC/lib/python3.10/site-packages/torch/lib/./libnvToolsExt-847d78f2.so.1...

I noticed this [issue](https://github.com/pytorch/pytorch/issues/73829). And maybe it can't be solved now.

> I actually succcessfully installed flash-attn 2.5.7 with vllm 0.4.1 and it can be detected with vllm (Using FlashAttention backend). But the performance is remaining the same (there is not...

Is it related to [this](https://github.com/vllm-project/vllm/issues/12529) issue?

same problem when there are multiple requests at the same time.

Will V1 support flashinfer in the future?

@WoosukKwon Hi, vllm nightly wheel doesn't have `v1/spec_decode` directory. ``` raceback (most recent call last): File "/data/lijinghui/uv_projects/.venv/lib/python3.12/site-packages/gunicorn/arbiter.py", line 608, in spawn_worker worker.init_process() File "/data/lijinghui/uv_projects/.venv/lib/python3.12/site-packages/uvicorn/workers.py", line 75, in init_process super().init_process() File...