wentao

Results 51 comments of wentao

> From the log that you provided, there is a OOM Error. This could because of several reasons: > 1. The model weight is too large to be loaded--> please...

> > > From the log that you provided, there is a OOM Error. This could because of several reasons: > > > > > > 1. The model weight...

> 24 GB should be sufficient for the 3B model, as the thinker weights only require about 6 GB of memory. In my reproduction, I also limited the memory size...

```# stage config for running qwen2.5-omni with architecture of OmniLLM. stage_args: - stage_id: 0 runtime: process: true # Run this stage in a separate process devices: "0" # Visible devices...

@tzhouam Could you share your startup command?

> I have successfully initiated the online service for Qwen 2.5 Omni 3B by this command and stage config, with 2 cards of memory less than 24GB. BUT you need...

> you can use vllm-omni main branch + vllm v0.11.0 and use tests/multi_stages/stage_configs/qwen2_5_omni_ci.yaml > > ``` > vllm serve /mnt/data/models/Qwen/Qwen2___5-Omni-3B \ > --served-model-name Qwen2.5-Omni-3B \ > --host 0.0.0.0 \ >...

> > you can use vllm-omni main branch + vllm v0.11.0 and use tests/multi_stages/stage_configs/qwen2_5_omni_ci.yaml > > ``` > > vllm serve /mnt/data/models/Qwen/Qwen2___5-Omni-3B \ > > --served-model-name Qwen2.5-Omni-3B \ > >...

> cc [@david6666666](https://github.com/david6666666) [@tzhouam](https://github.com/tzhouam) Everything has been resolved, thank you! However, I found that the multimodal model is very token-intensive, consumes a lot of cache, and performs poorly with longer...