torchrec issues

Create an interface to enable eviction policy

3

Summary: To support various types of eviction policy, the `HashZchManagedCollisionModule` needs to be able to calculate a score (e.g., TTL) for each incoming ID and pass it to the kernel....

dracifer

CLA Signed

fb-exported

Implement unlift_and_swap function

2

Summary: Implement a new swapping API which does the following: 1. Takes an exported program and torchrec serializer 2. Constructs torchrec modules based on serialized metadata stored in the exported...

angelayi

CLA Signed

fb-exported

Log dense storage and kjt storage for APS training jobs

1

Summary: We parse torchrec logs from individual jobs (e.g. P1542961580). Our analyzer fails when Dense/KJT Storage is not available in the logs. This can be due to different types of...

shourya-mukherjee

CLA Signed

fb-exported

add optimizer key in EmbeddingFusedOptimizer and KeyValueEmbeddingFusedOptimizer.

3

Summary: Add field optimizer_key in Torchrec EmbeddingFusedOptimizer. During the initialization of embedding module BatchedFusedEmbeddingBag, pass the optimizer_key information from fused parameters when creating the EmbeddingFusedOptimizer. In sparse arch, update the...

tristanhy

CLA Signed

fb-exported

Fixed typo in "How to contribute" guide

2

Fixing: #2441

Ronak99

CLA Signed

[torchrec] Minor typo in "How to contribute" guide

Typo on line: [Typo here](https://github.com/pytorch/torchrec/blob/main/docs/source/index.rst?plain=1#L64)

Ronak99

add high level arch docs

3

Differential Revision: D63556493

iamzainhuda

CLA Signed

fb-exported

1.65x qps - remove unnecessary cat

1

Summary: Research doc: https://docs.google.com/document/d/1nDdQiJDnqJKzjzM3ku__Y5j196uxRVEB00Mj6qAl31k/edit Run ada model: https://www.internalfb.com/vanguard/serving_test_cases/487129480789691 We can see huge cpu time spend on cat, which is unnecessary for ada cases, we only cat one tensor, should be...

SeanXiaohengMao

CLA Signed

fb-exported

The optimizer state key names differ when using `data_parallel` for embedding sharding compared to when using `row_wise`

2

We can reproduce this problem using the following command: `torchrun --master_addr=127.0.0.1 --master_port=1234 --nnodes=1 --nproc-per-node=1 --node_rank=0 test_optimizer_state.py --sharding_type $SHARDING_TYPE`, and use the enviroment `torchrec==0.8.0+cu121, torch==2.4.0+cu121, fbgemm-gpu==0.8.0+cu121` when **SHARDING_TYPE=row_wise**, it will print...

tiankongdeguiji

Fix inference_legacy auto dependencies

2

Summary: The legacy inference solution had duplicate headers that was causing auto dep issues. All the inference_legacy references now only reference inference_legacy folder. Differential Revision: D62901035

PaulZhang12

CLA Signed

fb-exported

torchrec
torchrec copied to clipboard

Metadata

Create an interface to enable eviction policy

Implement unlift_and_swap function

Log dense storage and kjt storage for APS training jobs

add optimizer key in EmbeddingFusedOptimizer and KeyValueEmbeddingFusedOptimizer.

Fixed typo in "How to contribute" guide

[torchrec] Minor typo in "How to contribute" guide

add high level arch docs

1.65x qps - remove unnecessary cat

The optimizer state key names differ when using `data_parallel` for embedding sharding compared to when using `row_wise`

Fix inference_legacy auto dependencies

← Metadata

Owner

Metadata

torchrec torchrec copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchrec
torchrec copied to clipboard