torchrec icon indicating copy to clipboard operation
torchrec copied to clipboard

Pytorch domain library for recommendation systems

Results 455 torchrec issues
Sort by recently updated
recently updated
newest added

Summary: # context * please refer to this [plan doc](https://docs.google.com/document/d/1E45sbCPVA7JzG18BFS0tTOQHMLETGuZxkoupRwTwqkM/edit#heading=h.o7xaxy435ue4) * add trace for the pipeline benchmark * add single process runner for the pipeline benchmark {F1832319035} Differential Revision: D61637749

CLA Signed
fb-exported

Summary: Support optimizer batch norm clipping in the nD parallelism case, specifically about EP + FSDP2: 1. For the global pg or device_mesh, the behavior is to `AllReduce` over all...

CLA Signed
fb-exported

Summary: collect performance related metrics from KV store and export them to ODS Differential Revision: D61417980

CLA Signed
fb-exported

Summary: To improve inference, we want to make creating a KJT as cheap as possible, which means the init method is nothing more than a attribute setter. All other fields...

CLA Signed
fb-exported

Summary: Enable TW pruning for TorchRec inference modules. We switch from pruned_indices_remapping to num_rows_post_pruning given the fact that the indices remapping isn't calculated until after physical transformations. Logical transformations (optimizing...

CLA Signed
fb-exported

Summary: As title. Differential Revision: D61856064

CLA Signed
fb-exported

Summary: allow user to change set_to_none param in KeyedOptimizer which was previously not respected. torchrec set_to_none defaults are unchanged at the moment until we know downstream effects Differential Revision: D61728138

CLA Signed
fb-exported

Summary: Ran into 3 issues while enabling pipeline for a model: 1) Current pipeline logic for finding and swapping a preproc module only works if the preproc module exists at...

CLA Signed
fb-exported

I am tuning hyper-parameters on two different compute clusters. Since the number of GPUs on these clusters varies, I need to use gradient accumulation (GA) to ensure that the total...

In the `_next_batch` method of `TrainPipelineSparseDist`, we check whether the new `dataloader_iter` is the same as the original `dataloader_iter`. We proceed to fetch the next batch only if they are...