Hanshi Sun comments

Results 25 comments of


                                            Hanshi Sun

Does Zero-Inference support TP?

Thanks! But how can I make it work? Do you have example command?

Does Zero-Inference support TP?

I tried to set num_gpus to 2, but seems it will make two identical model on each GPU at the same time.

Does Zero-Inference support TP?

Yes, you are right! Thanks! And single-gpu inference with kv-cache-offload's performance is really nice! But I have a question: I found that [fork of transformers](https://github.com/tjruwase/transformers/tree/kvcache-offload-cpu) actually allocate buffer for KV...

Does Zero-Inference support TP?

Thank you very much! Nice work!

how to change y1?

Hello, Thanks for your interest in our work! In our provided implementation, we set $\gamma_1 = 1$ because we observed that the performance is nearly the same for $\gamma_1 =...

Out of memory on H800

Hello, may I ask the memory for your device? You can try to decrease prefill from 124928 to 122880 to see if it is still OOM. The code can run...

Out of memory on H800

What is your `transformers` version? Can you set it to `transformers==4.37.2` since `apply_rotary_pos_emb` api changes for recent versions？

Out of memory on H800

Yeah I am using CUDA 12.1. Here is my `flash_attn` version. ``` >>> import torch >>> torch.__version__ '2.2.1+cu121' >>> import flash_attn >>> flash_attn.__version__ '2.5.7' ```

Out of memory on H800

I have added a FAQ in the README :)

Adapt to open source inference framework

Yeah, this is a good idea!