Mixtral 8x7B network support for thunder.jit path
🚀 Feature
Mixtral 8x7B is a mixture-of-experts LLM that splits the parameters in 8 distinct groups an I would like to do both training and inference with Thunder.
Work items
- [x] Run
thunder.examine - [ ] #124
- [ ] #195
- [x] #187
- [ ] #303
- [ ] Create Mixtral benchmark
Additional context
Even though examine does not signal any problem with the ops, some testing revealed that Mixtral uses torch.where(condition) signature of the torch.where function which is not supported at the moment. Moreover, the second issue I was able to identify stems from the indexing done in Mixtral forward function. At the moment, the _advanced_indexing clang operation does not take into account None as a valid index together with other tensors.
Note that unless you rearrange the mixing over what is commonly implemented, you will have data-dependent control flow.
you will have data-dependent control flow.
Exactly! What is our current stance on data-dependent control flows?
I don't think it's on the roadmap any time soon.
Adding #303 that might be the key to get the model supported in Thunder
cc. @IvanYashchuk
Update to this issue: Mixtral 8x7B is now supported using ThunderFX path. The issues listed above remain for the JIT code path.