torchrec issues

change partitioner class methods to function

2

Summary: This diff changes unnecessary GreedyPerfPartitioner class methods to function - gets rid of GreedyPerfPartitioner, and only leaves needed functions. Subsequently, GreedyPerfPartitioner was changed to a partition function in parallelized...

LBneus

CLA Signed

fb-exported

discarding unwanted sharding_type-kernel combinations

2

Summary: -> this diff discards bad sharding_type-kernel combinations from enumerator; by so, we make sure that we never consider bad sharding plans: "data-parallel" sharding type is never with "batched-fused","batched-fused-uvm", or...

LBneus

CLA Signed

fb-exported

Add Quantized Comms example to golden models

3

Show API usage to enable quantized comms

YLGH

CLA Signed

add JaggedTensorMeta

1

Summary: Add JaggedTensorMeta Differential Revision: D37840024

yinbinm

CLA Signed

fb-exported

Automated submodule update: FBGEMM

9

This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: https://github.com/pytorch/FBGEMM/commit/bff21c73487c5dc501acddb4788d985e9487bd68 Test Plan: Ensure that CI jobs succeed on GitHub before landing.

facebook-github-bot

CLA Signed

Support sparse gradient for embedding bag collection

1

Summary: see https://pytorch.org/docs/stable/generated/torch.nn.EmbeddingBag.html Reviewed By: mrfox321 Differential Revision: D37792917

mmenz

CLA Signed

fb-exported

Repro the ghost processes for TorchAsyncITer

1

Summary: First generate the data: bash nvt_preproc.sh /data/criteo/ /data/criteo_1_day/ 8192 Then run the command: torchx run -s local_cwd dist.ddp -j 1x8 --script train_torchrec.py -- --num_embeddings_per_feature 45833188,36746,17245,7413,20243,3,7114,1441,62,29275261,1572176,345138,10,2209,11267,128,4,974,14,48937457,11316796,40094537,452104,12606,104,35 --over_arch_layer_sizes 1024,1024,512,256,1 --binary_path /data/criteo_1_day/criteo_preproc/train/...

RenfeiChen-FB

CLA Signed

fb-exported

TorchRec first class QuantizedComms support

4

Summary: Motivation is that we want to OSS quantized comm library, and refactor torchrec quant comms support This diff is rather large (used to be a bunch of small diffs)....

YLGH

CLA Signed

fb-exported

base example

1

Base training loop examples run cmd `torchx run -s local_cwd dist.ddp -j 1x8 --script train_dlrm.py ` Some TODO items: 1. Add NE/QPS metrics checkpointing 2. Show saving this model and...

YLGH

CLA Signed

rename quantized_comms_config -> qcomms_config

5

Summary: Rename quantized comms config Differential Revision: D37221312

YLGH

CLA Signed

fb-exported

torchrec
torchrec copied to clipboard

Metadata

change partitioner class methods to function

discarding unwanted sharding_type-kernel combinations

Add Quantized Comms example to golden models

add JaggedTensorMeta

Automated submodule update: FBGEMM

Support sparse gradient for embedding bag collection

Repro the ghost processes for TorchAsyncITer

TorchRec first class QuantizedComms support

base example

rename quantized_comms_config -> qcomms_config

← Metadata

Owner

Metadata

torchrec torchrec copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchrec
torchrec copied to clipboard