Lvjinhong comments

Results 12 comments of


                                            Lvjinhong

[Bug]: The graphics card memory is full

> hmm, what gpu do you use? Normally most people will add --medvram or --xformers (for some gpus) to allow it to run on 6 or even 4GB of VRAM....

[Bug]: The graphics card memory is full

![image](https://user-images.githubusercontent.com/96970081/198796613-304facf8-6aa3-4c88-b760-d098b65376db.png) -medvram will look like this, I think I can try the previous version

[WIP] Support for cached multi-query attention towards speculative decoding

> @beginlner thanks for the info. Reading https://github.com/microsoft/DeepSpeed-Kernels/blob/main/dskernels/inf_flash_attn/blocked_flash/flash_fwd_kernel.h as well. So far, is there any progress on enabling speculative decoding for vLLM? Additionally, I'm wondering if the implementation of this...

[WIP] Support for cached multi-query attention towards speculative decoding

When can this branch be merged? In the version I am currently using, there is: ``` op=xops.fmha.MemoryEfficientAttentionFlashAttentionOp[0] if (is_hip()) else None, ``` Is the Flash operation supported only for HIP?

Custom all reduce kernels

Very good work, may I ask if it can be merged into the main branch soon

TGI performance is better than vllm on A800

> I think this is not TGI better, but vllm result are some sort miss aligned with huggingface's transformers. > > Not sure its a bug or a feature, but...

Failed to load with: Error Fetching Resource

I seem to have the same issue, and I've been waiting for about ten minutes, and it's still the same. Seeing this returned error, it shouldn't be my problem, right?...

Failed to load with: Error Fetching Resource

in github，like this: ![image](https://github.com/DenverCoder1/github-readme-streak-stats/assets/96970081/410b8842-f22b-4b1a-9bd3-3dfabeb5c304)

无法在提供的链接里下载整合包

嗯好吧，在linux里面需要先mv成7z后缀，然后用7z x就可以解压了

Failed to build dropout-layer-norm

I've tried installing flash-attn using pip install flash-attn==2.2.1 and flash-attn==2.3. It can be seen that the installation was ultimately successful. However, when I attempt distributed training with Megatron LM, I...