threewayhandshake

Results 2 issues of threewayhandshake

Adafactor originally does its own an approximation of second moment. But when GaLore is enabled, that approximation is done based on the shrunken grad by GaLore instead of the raw...

**Describe the bug** I get `AttributeError: Can't pickle local object 'FlopsProfiler.start_profile.. register_module_hooks..start_time_hook'` when I run torch.save on a model that has been run get_model_profile. I checked the flops_profiler code and...

bug
training