Brian Hirsh

Results 100 comments of Brian Hirsh

@Chillee @ezyang this should be ready for another round. Some high level questions/notes: (a) What do you all think of the partitioner calling convention changes? (b) Should functionalization (and the...

@Chillee (and also @wconstab, since you're familiar with the partitioner) friendly bump. I still need to rebase this though, on top of the recent partitioner changes.

I'm actually going to close this PR and create a fresh one. Thanks for the stamp and sorry about that Will. This is mostly because: (1) Ed's "trace with functionalization...

I don't see a label for `torch text`. @ejguan what label do you think this should go under?

Is the tensor allocation the main source of unacceptable overhead? We could probably figure out a way to avoid it, but another source of overhead is that a lot of...

The stack here: https://github.com/pytorch/pytorch/pull/123347 looks like it's finally enough to get the torchtrain repro working, with these change: ``` diff --git a/train.py b/train.py index 849ae78..171842a 100644 --- a/train.py +++ b/train.py...

tentatively marking hi-pri since running the backward at a different precision than the user asked for seems bad

I'm probably just forgetting something, but - if an op has a `CompositeImplicitAutograd` decomposition, then should vmap **always** run that decomposition? Since as you mentioned above, we definitely don't want...

Actually, it looks like scalar->tensor conversions get special treatment in torch_dispatch: `__torch_dispatch__` has some special logic to convert "wrapped tensor-scalars" back into scalar python constants ([code](https://github.com/pytorch/pytorch/blob/c0ed0f22cdc5ea80710c845f64d2e9a8026fb810/torch/csrc/jit/python/pybind_utils.cpp#L511)) But that doesn't work...

I tried that briefly and it caused some problems (but maybe we should just fix those problems haha) Specifically, because the `where.Scalar` decomp **also** changes the dtype when converting the...