Chanjun
Chanjun
add an unpack error type:extra types
> Hi @zhyncs, thanks for the interest and benchmarking, several things here: FlashInfer is not turned on by default, it can only be enabled with environment variable `VLLM_ATTENTION_BACKEND=FLASHINFER`. > >...
> > Hi @zhyncs, thanks for the interest and benchmarking, several things here: FlashInfer is not turned on by default, it can only be enabled with environment variable `VLLM_ATTENTION_BACKEND=FLASHINFER`. >...
> Hi @MichoChan , you can use https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/sparse_operation_kit/sparse_operation_kit/experiment/test/function_test/tf1/lookup/lookup_sparse_distributed_test.py to have a try ok, i take a try to test
> Hi @MichoChan , can you elaborate the CUDA version at which the header file is missing? If you use NGC container, e.g., `nvcr.io/nvidia/pytorch:22.12-py3` with `CUDA 11.8`, there is no...
> Hi @MichoChan , if you have to use cuda 11.2, it is good for you to leverage cub or thrust. > > ```shell > /usr/local/cuda/include# find -name "*scan*h" >...
> I haven’t seen this before with 32 tokens, which is odd. I used the same benchmark script. Can you try to install from the main branch? can you show...
i guess is multi gpu problem