lightning-thunder icon indicating copy to clipboard operation
lightning-thunder copied to clipboard

Mixtral 8x7B network support for thunder.jit path

Open riccardofelluga opened this issue 1 year ago • 5 comments

🚀 Feature

Mixtral 8x7B is a mixture-of-experts LLM that splits the parameters in 8 distinct groups an I would like to do both training and inference with Thunder.

Work items

  • [x] Run thunder.examine
  • [ ] #124
  • [ ] #195
  • [x] #187
  • [ ] #303
  • [ ] Create Mixtral benchmark

Additional context

Even though examine does not signal any problem with the ops, some testing revealed that Mixtral uses torch.where(condition) signature of the torch.where function which is not supported at the moment. Moreover, the second issue I was able to identify stems from the indexing done in Mixtral forward function. At the moment, the _advanced_indexing clang operation does not take into account None as a valid index together with other tensors.

riccardofelluga avatar Apr 16 '24 07:04 riccardofelluga

Note that unless you rearrange the mixing over what is commonly implemented, you will have data-dependent control flow.

t-vi avatar Apr 16 '24 09:04 t-vi

you will have data-dependent control flow.

Exactly! What is our current stance on data-dependent control flows?

riccardofelluga avatar Apr 16 '24 10:04 riccardofelluga

I don't think it's on the roadmap any time soon.

t-vi avatar Apr 16 '24 10:04 t-vi

Adding #303 that might be the key to get the model supported in Thunder

cc. @IvanYashchuk

riccardofelluga avatar May 07 '24 13:05 riccardofelluga

Update to this issue: Mixtral 8x7B is now supported using ThunderFX path. The issues listed above remain for the JIT code path.

riccardofelluga avatar Nov 13 '24 12:11 riccardofelluga