Yanli Zhao

Results 5 comments of Yanli Zhao

For native FSDP version, feel free to use "transformer_auto_wrap_policy" to wrap your model, also try the new mixedprecision config for bfloat16:)

@SeanNaren thanks for trying PTD FSDP! 1. would you please print(model) after constructing the whole model? we found some bugs in lightning, seems the outermost model is not wrapped, it...

@rohan-varma do you know who can help with this?

I think the SPSD and CUDA device only assumption is current FSDP state, " we can provide earlier and cleaner error handling in the case the user forgets to set...

> @ezyang Sorry, I assume you mean use `torch.view_as_real`, but I'm unsure how to modify the above DDP example to use it, or do you mean for a custom distributed...