lightning-thunder
lightning-thunder copied to clipboard
Add MoE layer example
DRAFT MODE TO PREVENT MERGES The approach and code is ready for experimentation and review.
The main result of this PR is that Thunder can run a variant of the MoE layer from LitGPT. There are three modifications
- zip is replaced with a for-loop with explicit indexing into lists (blocked by https://github.com/Lightning-AI/lightning-thunder/issues/284).
- Thunder doesn't support advanced indexing with
None(need to create an issue). The workaround is to use unsqueeze instead ofNonewhen indexing. - Inplace addition (
+=) is replaced withindex_add.
The main missing operator is nonzero(x, as_tuple=True). The problem with this operator is that the output shape is unknown at compile time and it's dynamic at runtime. I tried using NumberProxy with None, NumberProxy with a custom int subclass as value, using a custom int subclass directly. But simple -1 in the shape worked best.
The forward pass worked just with 14ce0978aaf32803884fe66b891b6a4ccd2fca7d. The backward pass required more of -1-special handling.
Currently, index_add, index_select, topk are not fused with any of Thunder's fusing executors.
Super exciting! Really looking forward to discuss this in more detail at a design review!
@t-vi and @carmocca, I think you'll be interested in this
Do we have any broader ideas for how this fits into the strategy for handling dynamic and data dependent shapes? I was under the impression that this was just something we were completely incapable of doing with the way that we're modeling traces.
Thunder doesn't support advanced indexing with None (need to create an issue). The workaround is to use unsqueeze instead of None when indexing.
Vaguely remember that I have run into None in indexing, but I think I was just seeing that with basic indexing....