ytxiong comments

Results 18 comments of


                                            ytxiong

找不到rotary_emb类[Bug]

这个问题看起来像是你没有正确安装flash_attn。你可以再确认一下你当前的运行环境中是否已经安装了flash_attn。

找不到rotary_emb类[Bug]

你运行import flash_attn能成功吗？这个rotary_emb是flash_attn里面的cuda扩展算子，这个应该和版本是没有关系的。理论上，flash_attn安装成功后，这个包是可以import成功的。

找不到rotary_emb类[Bug]

安装flash_attn可以参考[这个](https://github.com/InternLM/InternLM/blob/main/doc/en/install.md)

Does Qwen_2_5_VL support variable length attention computation?

@zucchini-nlp thank you very much. I see in verl, it passes position_ids[0] to flash attention. I am not sure it is correct.

[Usage]: How to use pipeline parallelism in offline inference?

> Minimal code snippet: > > from vllm import LLM > llm = LLM( > model=YOUR_MODEL_PATH, > pipeline_parallel_size=2, > ) @jeejeelee Thank you. You mean [this](https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/llm.py#L52)? I didn't see the...

[Usage]: How to use pipeline parallelism in offline inference?

> See: https://github.com/vllm-project/vllm/blob/main/vllm/engine/arg_utils.py#L112 ok Thank you very much, I will have a try

[Usage]: How to use pipeline parallelism in offline inference?

@jeejeelee I have tried to use pp in LLM API, however, I met the problem ``raise NotImplementedError( NotImplementedError: Pipeline parallelism is only supported through AsyncLLMEngine as performance will be severely...

[Usage]: How to use pipeline parallelism in offline inference?

Thank you. I am new to vllm, Is this online inference?

[Usage]: How to use pipeline parallelism in offline inference?

So, how about offline inference? can async engine be used in offline inference?

[Usage]: How to use pipeline parallelism in offline inference?

@jeejeelee Okay，thank you