amy-why-3459

Results 3 issues of amy-why-3459

### Motivation. Training and serving an omni-modal model that possesses both strong offline multimodal understanding and real-time audio-visual interaction capabilities is highly challenging. The difficulties mainly arise in the following...

### Your current environment from vllm_omni.entrypoints.omni import Omni if __name__ == "__main__": omni = Omni(model="Tongyi-MAI/Z-Image-Turbo") prompt = "a cup of coffee on the table" images = omni.generate(prompt) images[0].save("coffee.png") ```text Your...

bug

### Motivation. The metric information is unclear. Unable to obtain prefill and decode times. {'e2e_requests': 1, 'e2e_total_time_ms': 497714.5309448242, 'e2e_sum_time_ms': 497714.20526504517, 'e2e_total_tokens': 0, 'e2e_avg_time_per_request_ms': 497714.20526504517, 'e2e_avg_tokens_per_s': 0.0, 'wall_time_ms': 497714.5309448242, 'final_stage_id': 2,...