Federico Cassano
                                        Results
                                        12
                                        issues of
                                        
                                
                                            Federico Cassano
                                        
                                    ### 🚀 The feature, motivation and pitch The LFCE kernel allocates a `grad_weight` tensor: https://github.com/linkedin/Liger-Kernel/blob/a8fa3bb37850e89500261024ff47da0c626ab75f/src/liger_kernel/ops/fused_linear_cross_entropy.py#L47 This tensor then gets updated throughout the chunked loss calculation and finally used in the...
Hello, I am getting the following error whenever I scale up training to 512 GPUs while using FSDP2 + AdamWFP8 + BF16 stochastic rounding: ``` torch._dynamo.exc.TorchRuntimeError: Dynamo failed to run...
                                                            bug
                                                        
                                                                                                            
                                                            distributed
                                                        
                                                                                                            
                                                            high priority
                                                        
                                                                                                            
                                                            optimizer
                                                        
                                                                                                            
                                                            triaged
                                                        
                                                                                                            
                                                            triage review