pytorch-lightning icon indicating copy to clipboard operation
pytorch-lightning copied to clipboard

FSDPPrecision should support 16-true with a loss scaler

Open zaptrem opened this issue 1 year ago • 1 comments

Description & Motivation

https://github.com/Lightning-AI/pytorch-lightning/blob/f6fd046552a1504023cb3386a8a0df418a810e4f/src/lightning/fabric/plugins/precision/fsdp.py#L61

What if I want to use fp16 true, but with a loss scaler? This is closer to DeepSpeed's default settings. With FSDP, 16-true, no loss scaler my model doesn't converge. However, with FSDP, 16-true, and a loss scaler (commented out the assert and fixed the typo'ed return scaler instead of return none line) my model converges.

Pitch

No response

Alternatives

No response

Additional context

No response

cc @borda

zaptrem avatar Jun 13 '24 02:06 zaptrem

I came here to open this issue, and you already did. I second this issue.

I fixed the package itself by adding

if scaler is not None and self.precision not in ["16-mixed", "16-true"]:
    raise ValueError(f"`precision={precision!r}` does not use a scaler, found {scaler}.")

but it has to be fixed naturally.

oabuhamdan avatar Jun 13 '24 18:06 oabuhamdan