train 1B model on 32G V100 gpu ,flash_attention not support, any one train 1B model on V100? A100 cost expensive
addition: My task is an easy single picture classify, I find 1B model outperform Clip by a large margin, so wants to train 1B model on V100
you can set _attn_implementation to eager in the config to disable flash attn.
您的来信已收到,谢谢!陈雷同济大学测绘与地理信息学院Thanks for your attention.Chen Lei College of survey and geo-information of Tongji university
Hello @Weiyun1025 Thanks for your reply.
after this change I am still getting error
RuntimeError: FlashAttention only supports Ampere GPUs or newer.
For your reference I shared my current notebook If you can help me
(https://github.com/kachhadiyaraj15/internvl_testing/blob/main/inrternvl_2_5_flash_attention_error.ipynb]
您的来信已收到,谢谢!陈雷同济大学测绘与地理信息学院Thanks for your attention.Chen Lei College of survey and geo-information of Tongji university
您的来信已收到,谢谢!陈雷同济大学测绘与地理信息学院Thanks for your attention.Chen Lei College of survey and geo-information of Tongji university
请问最后解决了吗?使用eager模式时,attention mask的维度应该不匹配吧?需要修改dataset的collate