InternVL train 1B model on 32G V100 gpu ，flash_attention not support, any one train 1B model on V100? A100 cost expensive

addition: My task is an easy single picture classify, I find 1B model outperform Clip by a large margin, so wants to train 1B model on V100

Dec 23 '24 08:12 CLIsVeryOK

you can set _attn_implementation to eager in the config to disable flash attn.

Dec 23 '24 10:12 Weiyun1025

您的来信已收到，谢谢！陈雷同济大学测绘与地理信息学院Thanks for your attention.Chen Lei College of survey and geo-information of Tongji university

Dec 23 '24 10:12 CLIsVeryOK

Hello @Weiyun1025 Thanks for your reply.

after this change I am still getting error RuntimeError: FlashAttention only supports Ampere GPUs or newer. For your reference I shared my current notebook If you can help me (https://github.com/kachhadiyaraj15/internvl_testing/blob/main/inrternvl_2_5_flash_attention_error.ipynb]

Feb 21 '25 18:02 kachhadiyaraj15

您的来信已收到，谢谢！陈雷同济大学测绘与地理信息学院Thanks for your attention.Chen Lei College of survey and geo-information of Tongji university

Feb 21 '25 18:02 CLIsVeryOK

您的来信已收到，谢谢！陈雷同济大学测绘与地理信息学院Thanks for your attention.Chen Lei College of survey and geo-information of Tongji university

请问最后解决了吗？使用eager模式时，attention mask的维度应该不匹配吧？需要修改dataset的collate

Aug 27 '25 01:08 GUOhm230