Chanjun comments

Results 28 comments of


                                            Chanjun

fix a small error

add an unpack error type:extra types

[Kernel] Use flashinfer for decoding

> Hi @zhyncs, thanks for the interest and benchmarking, several things here: FlashInfer is not turned on by default, it can only be enabled with environment variable `VLLM_ATTENTION_BACKEND=FLASHINFER`. > >...

[Kernel] Use flashinfer for decoding

> > Hi @zhyncs, thanks for the interest and benchmarking, several things here: FlashInfer is not turned on by default, it can only be enabled with environment variable `VLLM_ATTENTION_BACKEND=FLASHINFER`. >...

[SOK] LookupOrCreate in lookup_adapter has no matched func in embedding var？

是多写了个参数true吗？

[Question] tensorflow 1.15 sok example

> Hi @MichoChan , you can use https://github.com/NVIDIA-Merlin/HugeCTR/blob/main/sparse_operation_kit/sparse_operation_kit/experiment/test/function_test/tf1/lookup/lookup_sparse_distributed_test.py to have a try ok, i take a try to test

[BUG] cooperative_groups/scan.h not in cuda11.X

use cub ？

[BUG] cooperative_groups/scan.h not in cuda11.X

> Hi @MichoChan , can you elaborate the CUDA version at which the header file is missing? If you use NGC container, e.g., `nvcr.io/nvidia/pytorch:22.12-py3` with `CUDA 11.8`, there is no...

[BUG] cooperative_groups/scan.h not in cuda11.X

> Hi @MichoChan , if you have to use cuda 11.2, it is good for you to leverage cub or thrust. > > ```shell > /usr/local/cuda/include# find -name "*scan*h" >...

GEMV_fast error

> I haven’t seen this before with 32 tokens, which is odd. I used the same benchmark script. Can you try to install from the main branch? can you show...

GEMV_fast error

i guess is multi gpu problem