candle
candle copied to clipboard
Add the Phi 3.5 MoE model
The Phi 3.5 MoE model is a ~42B parameter model with 16 experts, using 2 active.
This PR implements the model and provides a simple inference example.
Additionally, this PR adds a layers module to candle_transformers. Perhaps we can use this to store useful layers, such as Phi 3 or Llama RoPE. In particular, the Phi 3 RoPE implementation has been added. This is currently only used in the MoE model, but I was wondering if we could also use this in the regular Phi 3 model?