Brian Hirsh comments

Results 100 comments of


                                            Brian Hirsh

FX pass to move input mutations into submodule

@Chillee @ezyang this should be ready for another round. Some high level questions/notes: (a) What do you all think of the partitioner calling convention changes? (b) Should functionalization (and the...

FX pass to move input mutations into submodule

@Chillee (and also @wconstab, since you're familiar with the partitioner) friendly bump. I still need to rebase this though, on top of the recent partitioner changes.

FX pass to move input mutations into submodule

I'm actually going to close this PR and create a fresh one. Thanks for the stamp and sorry about that Will. This is mostly because: (1) Ed's "trace with functionalization...

Fast WordPiece Tokenization

I don't see a label for `torch text`. @ejguan what label do you think this should go under?

dedup-error checking code (from structured kernels) with batching rules

Is the tensor allocation the main source of unacceptable overhead? We could probably figure out a way to avoid it, but another source of overhead is that a lot of...

FSDP + SP does not work with --compile

The stack here: https://github.com/pytorch/pytorch/pull/123347 looks like it's finally enough to get the torchtrain repro working, with these change: ``` diff --git a/train.py b/train.py index 849ae78..171842a 100644 --- a/train.py +++ b/train.py...

multithreaded autograd backward doesn't respect autocast dtype context manager

tentatively marking hi-pri since running the backward at a different precision than the user asked for seems bad

Add mechanism to error out when registering a batching rule for a CompositeImplicitAutograd operation

I'm probably just forgetting something, but - if an op has a `CompositeImplicitAutograd` decomposition, then should vmap **always** run that decomposition? Since as you mentioned above, we definitely don't want...

torch.where + DDPoptimizer + Dynamo causes faketensor error

Actually, it looks like scalar->tensor conversions get special treatment in torch_dispatch: `__torch_dispatch__` has some special logic to convert "wrapped tensor-scalars" back into scalar python constants ([code](https://github.com/pytorch/pytorch/blob/c0ed0f22cdc5ea80710c845f64d2e9a8026fb810/torch/csrc/jit/python/pybind_utils.cpp#L511)) But that doesn't work...

torch.where + DDPoptimizer + Dynamo causes faketensor error

I tried that briefly and it caused some problems (but maybe we should just fix those problems haha) Specifically, because the `where.Scalar` decomp **also** changes the dtype when converting the...