ms-swift Qwen2.5 VL dataset sampler based on modality length

Qwen2.5 VL dataset sampler based on modality length

Open lucasjinreal opened this issue 6 months ago • 2 comments

Currently since Qwen2.5 VL with various image sizes, when training on relatively large image size, the image tokens are various for each batch, the default resampling logic would cause OOM without considering the multimodal various lengths in each item.

Any idea to support resampling based on modality length when training

Apr 20 '25 09:04 lucasjinreal

ms-swift ms-swift copied to clipboard

Qwen2.5 VL dataset sampler based on modality length

ms-swift
ms-swift copied to clipboard