torchrec issues

Fix pyre test on OSS remove ignore

1

Differential Revision: D72187346

aporialiao

CLA Signed

fb-exported

Allow undefined libraries to fix CMake errors

9

Summary: This should get rid of the `libtorch_cpu.so: undefined reference to `logf@GLIBC_2.27'` etc.. we see in our cpp unit tests on Github CI See: https://fb.workplace.com/groups/1405155842844877/permalink/23944202491846891/ Note there is another error...

aporialiao

CLA Signed

fb-exported

[Bug]: ShardedManagedCollisionEmbeddingCollection throws an IndexError when "return_remapped_features=True"

1

I utilize `ManagedCollisionEmbeddingCollection` with `DistributedModelParallel` to store hashID embeddings during distributed training. An error occurs when setting `return_remapped_features=True` with **a single embedding table configuration**, but it resolves when a second...

rayhuang90

Force import with post script OSS job

1

Differential Revision: D71578741

aporialiao

CLA Signed

fb-exported

[oss][ci] fix tests and cmake C++ linking

tsia

iamzainhuda

CLA Signed

Add SimpleFSDP to leaf node

2

Summary: Torchrec should not trace into SimpleFSDP which will cause failures. We thus added it to the leaf node Differential Revision: D71450848

Microve

CLA Signed

fb-exported

Floating point exception (core dumped) when use quantize embeddings with float32 dtype

3

Using quantized embeddings with the float32 data type may lead to Floating point exception (core dumped)，We can reproduce this using the following command: `python test_quant.py`，and use the enviroment `torchrec==1.1.0+cu124, torch==2.6.0+cu124,...

tiankongdeguiji

row-wise alltoall error when some embeddings use mean pooling and others use sum pooling

2

There is an "alltoall" error when using row-wise sharding, where some embeddingbags utilize mean pooling while others use sum pooling. We can reproduce this using the following command: `torchrun --master_addr=localhost...

tiankongdeguiji

The `static_dict_gather` function encounters precision issue when use mc-ebc

1

The `static_dict_gather` function encounters precision issue when use `mc-ebc`. We can reproduce this using the following command: torchrun --master_addr=localhost --master_port=49941 --nnodes=1 --nproc-per-node=2 test_mc_ebc_export.py，and use the enviroment torchrec==1.1.0+cu124, torch==2.6.0+cu124, fbgemm-gpu==1.1.0+cu124. test_mc_ebc_export.py...

tiankongdeguiji

TypeError: unsupported operand type(s) for /: 'Tensor' and 'NoneType' when use mc-ebc and mean pooling

1

When use `mc-ebc` and `pooling=PoolingType.MEAN`, we encounter issue `TypeError: unsupported operand type(s) for /: 'Tensor' and 'NoneType'`. We can reproduce this using the following command: `torchrun --master_addr=localhost --master_port=49941 --nnodes=1 --nproc-per-node=2...

tiankongdeguiji

torchrec
torchrec copied to clipboard

Metadata

Fix pyre test on OSS remove ignore

Allow undefined libraries to fix CMake errors

[Bug]: ShardedManagedCollisionEmbeddingCollection throws an IndexError when "return_remapped_features=True"

Force import with post script OSS job

[oss][ci] fix tests and cmake C++ linking

Add SimpleFSDP to leaf node

Floating point exception (core dumped) when use quantize embeddings with float32 dtype

row-wise alltoall error when some embeddings use mean pooling and others use sum pooling

The `static_dict_gather` function encounters precision issue when use mc-ebc

TypeError: unsupported operand type(s) for /: 'Tensor' and 'NoneType' when use mc-ebc and mean pooling

← Metadata

Owner

Metadata

torchrec torchrec copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchrec
torchrec copied to clipboard