Jacob Hinkle
Jacob Hinkle
Despite having two reshapes, the second case produces 1 kernel with either float or half inputs. I'm not sure how that is happening since there are two reshapes, so it...
This is a great example as it's used in any multiclass classification problem or learned compression network. Should we focus on separate forward and backward or also look at the...
Yeah always size 1, and it can be any of the values 0-32767. The index is the true label for an example, and the tensor we're indexing would be the...
`torch.gather(Z, 1, Y.unsqueeze(1)).squeeze()` would be the common pattern, where `Z` here is `torch.log_softmax(logits, 1)`, `Z` is of size [N, C] and Y is of size [N].
It might complicate things, but I'll note that this pattern when used in a loss is typically followed immediately by a reduction, so that the combo could be implemented by...
Interesting. In the special case where the gather output has a **single** use which is a reduction including the gather axis, rewriting the graph from ```c++ tv2 = torch_gather(tv1, tv_index,...
Note this is not specific to `where`; a similar error is made for `add(tv0, IrBuilder::create(5.0))`.
I'm reimplementing the tests from https://github.com/csarofeen/pytorch/blob/cpp_frontend_tests/test/test_nvfuser_frontend.py#L58 and using the same naming scheme. I'll make this more clear in comments.
Forced pushed (rebase to devel following Jie's refactor).
I believe the `squeeze` might get introduced here: https://github.com/Lightning-AI/lightning-thunder/blob/21667f5ae95cb8fa23edbe7438a4a27983ee87fd/thunder/clang/__init__.py#L933-L936 The above indexing method is called in the diagonal clang op here https://github.com/Lightning-AI/lightning-thunder/blob/21667f5ae95cb8fa23edbe7438a4a27983ee87fd/thunder/clang/__init__.py#L383-L398