torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

Pytorch domain library for recommendation systems

Results 455 torchrec issues
Sort by recently updated
recently updated
newest added

Differential Revision: D72187346

CLA Signed
fb-exported

Summary: This should get rid of the `libtorch_cpu.so: undefined reference to `logf@GLIBC_2.27'` etc.. we see in our cpp unit tests on Github CI See: https://fb.workplace.com/groups/1405155842844877/permalink/23944202491846891/ Note there is another error...

CLA Signed
fb-exported

I utilize `ManagedCollisionEmbeddingCollection` with `DistributedModelParallel` to store hashID embeddings during distributed training. An error occurs when setting `return_remapped_features=True` with **a single embedding table configuration**, but it resolves when a second...

Differential Revision: D71578741

CLA Signed
fb-exported

Summary: Torchrec should not trace into SimpleFSDP which will cause failures. We thus added it to the leaf node Differential Revision: D71450848

CLA Signed
fb-exported

Using quantized embeddings with the float32 data type may lead to Floating point exception (core dumped),We can reproduce this using the following command: `python test_quant.py`,and use the enviroment `torchrec==1.1.0+cu124, torch==2.6.0+cu124,...

There is an "alltoall" error when using row-wise sharding, where some embeddingbags utilize mean pooling while others use sum pooling. We can reproduce this using the following command: `torchrun --master_addr=localhost...

The `static_dict_gather` function encounters precision issue when use `mc-ebc`. We can reproduce this using the following command: torchrun --master_addr=localhost --master_port=49941 --nnodes=1 --nproc-per-node=2 test_mc_ebc_export.py,and use the enviroment torchrec==1.1.0+cu124, torch==2.6.0+cu124, fbgemm-gpu==1.1.0+cu124. test_mc_ebc_export.py...

When use `mc-ebc` and `pooling=PoolingType.MEAN`, we encounter issue `TypeError: unsupported operand type(s) for /: 'Tensor' and 'NoneType'`. We can reproduce this using the following command: `torchrun --master_addr=localhost --master_port=49941 --nnodes=1 --nproc-per-node=2...