torchrec issues

add trace and single process runner for pipeline benchmark

2

Summary: # context * please refer to this [plan doc](https://docs.google.com/document/d/1E45sbCPVA7JzG18BFS0tTOQHMLETGuZxkoupRwTwqkM/edit#heading=h.o7xaxy435ue4) * add trace for the pipeline benchmark * add single process runner for the pipeline benchmark {F1832319035} Differential Revision: D61637749

TroyGarden

CLA Signed

fb-exported

Support optimizer batch norm clipping in the nD case

1

Summary: Support optimizer batch norm clipping in the nD parallelism case, specifically about EP + FSDP2: 1. For the global pg or device_mesh, the behavior is to `AllReduce` over all...

yoyoyocmu

CLA Signed

fb-exported

add ods logging for l2 cache perf

3

Summary: collect performance related metrics from KV store and export them to ODS Differential Revision: D61417980

duduyi2013

CLA Signed

fb-exported

Faster KJT init

7

Summary: To improve inference, we want to make creating a KJT as cheap as possible, which means the init method is nothing more than a attribute setter. All other fields...

dstaay-fb

CLA Signed

fb-exported

Enable TW pruning, no remapping

1

Summary: Enable TW pruning for TorchRec inference modules. We switch from pruned_indices_remapping to num_rows_post_pruning given the fact that the indices remapping isn't calculated until after physical transformations. Logical transformations (optimizing...

PaulZhang12

CLA Signed

fb-exported

Unify arg names to align across kernels & acc & aten ops (Frontend Diff)

1

Summary: As title. Differential Revision: D61856064

StellarrZ

CLA Signed

fb-exported

exposed set_to_none param in KeyedOptimizer

3

Summary: allow user to change set_to_none param in KeyedOptimizer which was previously not respected. torchrec set_to_none defaults are unchanged at the moment until we know downstream effects Differential Revision: D61728138

iamzainhuda

CLA Signed

fb-exported

Supported nested preproc modules which are called multiple times with different args

1

Summary: Ran into 3 issues while enabling pipeline for a model: 1) Current pipeline logic for finding and swapping a preproc module only works if the preproc module exists at...

sarckk

CLA Signed

fb-exported

[Question] Is there gradient accumulation support for training?

4

I am tuning hyper-parameters on two different compute clusters. Since the number of GPUs on these clusters varies, I need to use gradient accumulation (GA) to ensure that the total...

liuslnlp

When using TorchRec version 0.8.0 or later, we cannot train for more than one epoch when set `persistent_workers=true` in the dataloader.

2

In the `_next_batch` method of `TrainPipelineSparseDist`, we check whether the new `dataloader_iter` is the same as the original `dataloader_iter`. We proceed to fetch the next batch only if they are...

tiankongdeguiji

torchrec
torchrec copied to clipboard

Metadata

add trace and single process runner for pipeline benchmark

Support optimizer batch norm clipping in the nD case

add ods logging for l2 cache perf

Faster KJT init

Enable TW pruning, no remapping

Unify arg names to align across kernels & acc & aten ops (Frontend Diff)

exposed set_to_none param in KeyedOptimizer

Supported nested preproc modules which are called multiple times with different args

[Question] Is there gradient accumulation support for training?

When using TorchRec version 0.8.0 or later, we cannot train for more than one epoch when set `persistent_workers=true` in the dataloader.

← Metadata

Owner

Metadata

torchrec torchrec copied to clipboard

Metadata

← Metadata

Owner

Metadata

torchrec
torchrec copied to clipboard