Jacob Hinkle comments

Results 20 comments of


                                            Jacob Hinkle

`views` in HF-Bart Self-Attention

Despite having two reshapes, the second case produces 1 kernel with either float or half inputs. I'm not sure how that is happening since there are two reshapes, so it...

Example of Gather/Scatter from CrossEntropyLoss

This is a great example as it's used in any multiclass classification problem or learned compression network. Should we focus on separate forward and backward or also look at the...

Example of Gather/Scatter from CrossEntropyLoss

Yeah always size 1, and it can be any of the values 0-32767. The index is the true label for an example, and the tensor we're indexing would be the...

Example of Gather/Scatter from CrossEntropyLoss

`torch.gather(Z, 1, Y.unsqueeze(1)).squeeze()` would be the common pattern, where `Z` here is `torch.log_softmax(logits, 1)`, `Z` is of size [N, C] and Y is of size [N].

Example of Gather/Scatter from CrossEntropyLoss

It might complicate things, but I'll note that this pattern when used in a loss is typically followed immediately by a reduction, so that the combo could be implemented by...

Example of Gather/Scatter from CrossEntropyLoss

Interesting. In the special case where the gather output has a **single** use which is a reduction including the gather axis, rewriting the graph from ```c++ tv2 = torch_gather(tv1, tv_index,...

Compile error in `where(x, a, b)` with single precision `a` or `b`

Note this is not specific to `where`; a similar error is made for `add(tv0, IrBuilder::create(5.0))`.

C++ version of frontend tests

I'm reimplementing the tests from https://github.com/csarofeen/pytorch/blob/cpp_frontend_tests/test/test_nvfuser_frontend.py#L58 and using the same naming scheme. I'll make this more clear in comments.

C++ version of frontend tests

Forced pushed (rebase to devel following Jie's refactor).

Enable `take` and `take_along_axis` in nvfuser executor

I believe the `squeeze` might get introduced here: https://github.com/Lightning-AI/lightning-thunder/blob/21667f5ae95cb8fa23edbe7438a4a27983ee87fd/thunder/clang/__init__.py#L933-L936 The above indexing method is called in the diagonal clang op here https://github.com/Lightning-AI/lightning-thunder/blob/21667f5ae95cb8fa23edbe7438a4a27983ee87fd/thunder/clang/__init__.py#L383-L398