torchrec
torchrec copied to clipboard
Pytorch domain library for recommendation systems
Summary: This should get rid of the `libtorch_cpu.so: undefined reference to `logf@GLIBC_2.27'` etc.. we see in our cpp unit tests on Github CI See: https://fb.workplace.com/groups/1405155842844877/permalink/23944202491846891/ Note there is another error...
I utilize `ManagedCollisionEmbeddingCollection` with `DistributedModelParallel` to store hashID embeddings during distributed training. An error occurs when setting `return_remapped_features=True` with **a single embedding table configuration**, but it resolves when a second...
Summary: Torchrec should not trace into SimpleFSDP which will cause failures. We thus added it to the leaf node Differential Revision: D71450848
Using quantized embeddings with the float32 data type may lead to Floating point exception (core dumped),We can reproduce this using the following command: `python test_quant.py`,and use the enviroment `torchrec==1.1.0+cu124, torch==2.6.0+cu124,...
There is an "alltoall" error when using row-wise sharding, where some embeddingbags utilize mean pooling while others use sum pooling. We can reproduce this using the following command: `torchrun --master_addr=localhost...
The `static_dict_gather` function encounters precision issue when use `mc-ebc`. We can reproduce this using the following command: torchrun --master_addr=localhost --master_port=49941 --nnodes=1 --nproc-per-node=2 test_mc_ebc_export.py,and use the enviroment torchrec==1.1.0+cu124, torch==2.6.0+cu124, fbgemm-gpu==1.1.0+cu124. test_mc_ebc_export.py...
When use `mc-ebc` and `pooling=PoolingType.MEAN`, we encounter issue `TypeError: unsupported operand type(s) for /: 'Tensor' and 'NoneType'`. We can reproduce this using the following command: `torchrun --master_addr=localhost --master_port=49941 --nnodes=1 --nproc-per-node=2...