Paul Zhang
Paul Zhang
Summary: Introduce BoundsCheckMode fused_param for TBE BoundsCheckMode. There is no reason really to run bounds_check_indices during inference use case (AIMP has it off by default: https://fburl.com/code/q8zhundg), and it causes issues...
Summary: With _unwrap_kjt_for_cpu, and recording the runtime_device in QuantEmbeddingBag, the conditional for the input device in _unwrap_kjt is no longer needed, as this path should always record the cuda device...
Differential Revision: D56728862
Summary: Registering custom ops for meta functionalization with ids can lead to hash collisions, resulting in wrong dimensions for a sparse module. This diff replaces custom op naming to just...
Summary: As there are more instances of KJT/TorchRec data types used in PT2 IR, more edge cases are popping up. This diff fixes a bug and hardens the test framework...
Summary: Test serialization of FPEBC + correctness tests with serialization/deserialization + test registering custom op in different environments, simulating training and inference Differential Revision: D57076081
Summary: Fix pyre-check, ex: https://github.com/pytorch/torchrec/actions/runs/9036052208/job/24832104332 Reviewed By: henrylhtsang Differential Revision: D57224368
Summary: Make KJT permute work in edge cases for torch.export: https://fb.workplace.com/groups/6829516587176185/posts/7206415192819654/?comment_id=7207715762689597&reply_comment_id=7210658785728628. A cleaner solution than before as well. Necessary for enabling PT2 eager model processing on AIMP Differential Revision: D57162331