DeepSpeed icon indicating copy to clipboard operation
DeepSpeed copied to clipboard

Fix assert on Lamb optimizers with BF16

Open loadams opened this issue 2 years ago • 6 comments

loadams avatar Oct 04 '23 17:10 loadams

@loadams, let's add a unit test as well.

tjruwase avatar Oct 06 '23 10:10 tjruwase

@loadams Just fail on this assert when using lamb with bf16. May I ask if this will keep going?

Liangliang-Ma avatar Aug 21 '24 08:08 Liangliang-Ma

@loadams Just fail on this assert when using lamb with bf16. May I ask if this will keep going?

Hi @Liangliang-Ma - apologies, I lost track of this PR. I'll work on getting this PR updated and merged.

loadams avatar Aug 21 '24 15:08 loadams

@loadams Just fail on this assert when using lamb with bf16. May I ask if this will keep going?

Hi @Liangliang-Ma - apologies, I lost track of this PR. I'll work on getting this PR updated and merged.

@Liangliang-Ma - does this branch resolve your issue? Or do you have any other feedback on the PR?

loadams avatar Aug 22 '24 17:08 loadams

@loadams Just fail on this assert when using lamb with bf16. May I ask if this will keep going?

Hi @Liangliang-Ma - apologies, I lost track of this PR. I'll work on getting this PR updated and merged.

@Liangliang-Ma - does this branch resolve your issue? Or do you have any other feedback on the PR?

Yes, this one works.

Liangliang-Ma avatar Aug 28 '24 01:08 Liangliang-Ma

Failing HPU tests are a transformers issue that should be fixed in transformers soon.

loadams avatar Aug 29 '24 23:08 loadams