ms-swift icon indicating copy to clipboard operation
ms-swift copied to clipboard

Qwen2.5 VL dataset sampler based on modality length

Open lucasjinreal opened this issue 6 months ago • 2 comments

Currently since Qwen2.5 VL with various image sizes, when training on relatively large image size, the image tokens are various for each batch, the default resampling logic would cause OOM without considering the multimodal various lengths in each item.

Any idea to support resampling based on modality length when training

lucasjinreal avatar Apr 20 '25 09:04 lucasjinreal