Henry Tsang

Results 31 issues of Henry Tsang

Hi, in our repo, we have a python file in torchrec/distributed/shard.py that contains a function shard. We also want to keep the shard function inside torchrec/distributed/__init__.py, which we did by...

Summary: context: to be added Differential Revision: D57139157

CLA Signed
fb-exported

Summary: design doc: https://docs.google.com/document/d/17_nqdEtH6B_ev9Gnuw2mpgtFq4dqzC6-XUdw18R4F8Q/ # changes ## in ITEP make remap a callback of input dist ## in torchrec copy callbacks from input dist to fused input dist # impact...

CLA Signed
fb-exported

These talks capture what model parallelism is, what we are trying to do, and different components of TorchRec [Intro to Model Parallelism for Recommendations Systems](https://docs.google.com/presentation/d/1RJAoYGbqgyAPCGs8eKOBDhQTqD9MrznlaIOIC_f_3ww/edit?usp=drive_link) by Dennis [Torchrec Part 1](https://docs.google.com/presentation/d/1C6WKsbQBD8HIORH8bts-T5WK97dVidDNbHZZW738SSU/edit?usp=drive_link)...

Summary: Refactor the passing over cache params from dataclass to fused_params dict a bit. Differential Revision: D58886177

CLA Signed
fb-exported

Summary: backout D58372476 Reviewed By: sryap, francomomo Differential Revision: D58790885

CLA Signed
fb-exported

Summary: Fix horrible docstrings (created by me) on pytorch.org right now. Reason is for dataclasses, we need to handle the docs carefully. Before: {F1644093577} same attribute would appear twice. After:...

CLA Signed
fb-exported

## Bug Description acc tracer doesn't handle torch.max(tensor).values correctly ## To Reproduce ``` import torch import torch.nn as nn import torch_tensorrt.fx.tracer.acc_tracer.acc_tracer as acc_tracer device = torch.device("cuda") class MyModule(nn.Module): def forward(self,...

bug

Motivation: https://github.com/pytorch/pytorch/issues/139408 To reduce excessive warning logs. You can get back previous behavior by prepending `TORCH_LOGS="dynamic" ` repro: https://github.com/pytorch/pytorch/issues/139408 after: ``` /torch/fx/experimental/symbolic_shapes.py:6452] runtime_asserts_frozen but then got 3*TruncToInt(IntTrueDiv(s0, 1))*TruncToInt(IntTrueDiv(s1, 1)) <...

fb-exported
ciflow/trunk
release notes: fx
fx
ciflow/inductor
merging

**Describe the bug** ConvOperation3x seems to have two methods both called extended_name. One depends on layouts of A and B. Which one is being used? https://github.com/NVIDIA/cutlass/blob/affd1b693dfc121c51118cbc8583dfd308227ca6/python/cutlass_library/generator.py#L989-L1015 **Steps/Code to reproduce bug**...

bug
? - Needs Triage