torchrec
torchrec copied to clipboard
Pytorch domain library for recommendation systems
Summary: Registering custom ops for meta functionalization with ids can lead to hash collisions, resulting in wrong dimensions for a sparse module. This diff replaces custom op naming to just...
Summary: As there are more instances of KJT/TorchRec data types used in PT2 IR, more edge cases are popping up. This diff fixes a bug and hardens the test framework...
Summary: Biggest win in semi-sync pipeline. Post diff TrainPipelineBase | Runtime (P90): 10.098 s | Memory (P90): 8.418 GB TrainPipelineSparseDist | Runtime (P90): 10.050 s | Memory (P90): 8.655 GB...
Summary: Test serialization of FPEBC + correctness tests with serialization/deserialization + test registering custom op in different environments, simulating training and inference Differential Revision: D57076081
Summary: context: to be added Differential Revision: D57139157
Summary: PrefetchTrainPipelineSparseDist - use legacy TrainPipeline API and will refactor newer internals assuming memory neutral / or better. Differential Revision: D57143337
Summary: As users highlighted, TrainPipeline refactoring introduced memory regression ~2% due to more context management for code readability. This results in higher peak memory (takes longer for a context to...
Summary: As titled Differential Revision: D56863414
Summary: As titled Differential Revision: D56970438