Results 18 comments of Marco

cool thx for the check. as for the gradients I think it is quite normal since we are inserting a new module in a pretrained model

The qwen error is quite unusual and I think that the error might be carried form other parts of the code, I'll investigate in this days

Thx I'll do some tests tomorrow

Just pushed a new verison(1.1.6). I've tested it and the training seems to go well (the gradient stabilizes in about 10/20 steps, granted that max_norm is enabled) Both Gemma and...

> > I haven't dived into the implementation details, it just a on-the-fly fix i am using transformers 4.40.0.dev0 now > > 大佬,你的4.40.0.dev0是在哪儿安装的,我看最新的版本也才4.39.3。是不是装了这个就可以规避掉这个问题了,我用Qwen1.5-72B 在FSDP+Qlora也遇到同样的问题,我的Transfomer的版本是4.39.1。 You can install it from souce...

Nothing in particular, it had gone away with 1.1.6. What hardware are you running it from? There are many possible causes of a CUDA Error of this type. Can you...

Seems pretty strange, both of them works on my end, what models arr you using? What gpu do you use?

> Could you turn on the button to allow edits? I need to resolve the conflicts. > > ![image](https://private-user-images.githubusercontent.com/16256802/323309590-eb83c193-e40f-41f5-9df7-41b51b9f7518.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTMzNzc5MTYsIm5iZiI6MTcxMzM3NzYxNiwicGF0aCI6Ii8xNjI1NjgwMi8zMjMzMDk1OTAtZWI4M2MxOTMtZTQwZi00MWY1LTlkZjctNDFiNTFiOWY3NTE4LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA0MTclMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNDE3VDE4MTMzNlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWRlNDA5NjBiODVlYzllNTZiNTYyNDFiYjkwYTk2YWFjYjdmOWYzYWFmNDlkY2UyMTExMmEwZDYwYzlmMzE1ZmImWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.MLKOrAbhvuIrYXkUEGojfb5V0y6WNt2QMN2x8KkGUYQ) > > Btw, I find that there still exists some problem...

awesome thanks i'll try to debug it

No problem, I'll do it tomorrow. Alse please send the error it gives you, I can't manage to reproduce it