lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

Add MoE layer example

Open IvanYashchuk opened this issue 1 year ago • 4 comments
trafficstars

DRAFT MODE TO PREVENT MERGES The approach and code is ready for experimentation and review.

The main result of this PR is that Thunder can run a variant of the MoE layer from LitGPT. There are three modifications

  1. zip is replaced with a for-loop with explicit indexing into lists (blocked by https://github.com/Lightning-AI/lightning-thunder/issues/284).
  2. Thunder doesn't support advanced indexing with None (need to create an issue). The workaround is to use unsqueeze instead of None when indexing.
  3. Inplace addition (+=) is replaced with index_add.

The main missing operator is nonzero(x, as_tuple=True). The problem with this operator is that the output shape is unknown at compile time and it's dynamic at runtime. I tried using NumberProxy with None, NumberProxy with a custom int subclass as value, using a custom int subclass directly. But simple -1 in the shape worked best.

The forward pass worked just with 14ce0978aaf32803884fe66b891b6a4ccd2fca7d. The backward pass required more of -1-special handling.

Currently, index_add, index_select, topk are not fused with any of Thunder's fusing executors.

IvanYashchuk avatar Apr 30 '24 13:04 IvanYashchuk

Super exciting! Really looking forward to discuss this in more detail at a design review!

mruberry avatar May 01 '24 14:05 mruberry

@t-vi and @carmocca, I think you'll be interested in this

mruberry avatar May 01 '24 14:05 mruberry

Do we have any broader ideas for how this fits into the strategy for handling dynamic and data dependent shapes? I was under the impression that this was just something we were completely incapable of doing with the way that we're modeling traces.

apaz-cli avatar May 02 '24 02:05 apaz-cli

Thunder doesn't support advanced indexing with None (need to create an issue). The workaround is to use unsqueeze instead of None when indexing.

Vaguely remember that I have run into None in indexing, but I think I was just seeing that with basic indexing....

jjsjann123 avatar May 08 '24 21:05 jjsjann123