mlx-swift-examples
mlx-swift-examples copied to clipboard
Add Phi 3.5 MoE
This is my attempt to port the Phi 3.5 MoE model from the Python implementation. Unfortunately I can't test it myself, since my MacBook doesn't have enough RAM. I marked two places in PhiMoE.swift with comments starting with !! which need to be checked. You can test this with ModelConfiguration.phi3_5MoE. Go ahead and make any necessary changes if you'd like, since I won't be able to run this myself.
I will give it a try on Tuesday!
It looks like there is a problem loading the weights:
Error: Mismatched parameter weight shape. Actual [16, 6400, 512], expected [16, 6400, 4096]
this is on SwitchLinear (maybe this error needs some more context).
SwitchLinear(bias=nil, inputDims=4096, numExperts=16, outputDims=6400)
And actually the parameters has a biases which is not expected here:
(lldb) po parameters.mapValues { $0.shape }
▿ [
biases: [16, 6400, 64],
scales: [16, 6400, 64],
weight: [16, 6400, 512]
]
I need to look into this further
OK, I think SwitchLinear isn't quite right -- it is missing the bias (at the very least):
- https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/models/switch_layers.py#L87
and I think the dimension mismatch comes down to quantization -- the SwitchLinear isn't being replaced by SwitchLinearQuantized because it doesn't implement the protocol:
- https://github.com/ml-explore/mlx-swift/blob/main/Source/MLXNN/Quantized.swift#L14
and I think QuantizedSwitchLinear will need to be a subtype of SwitchLinear for the replacement to happen -- this @ModuleInfo(key: "gate_proj") var gateProj: SwitchLinear requires that the type be SwitchLinear or a subtype. Linear/QuantizedLinear are modeled the same way.
Thanks for that feedback. I've tried to make those changes, although I don't know how helpful this will be, since I unfortunately can't test it myself. If it's too much trouble and you'd prefer to focus on other priorities, don't worry about it.
OK, I will see if I can test/finish this -- it may be a few days before I get a chance.
Sorry for the long delay here -- was busy busy busy!
I fixed up the quantized switch layers -- there were some issues with how they got initialized.
The couple of lines you had commented as maybe issues were exactly the spots with problems. With a little testing everything is set there.
Model loaded -> id("mlx-community/Phi-3.5-MoE-instruct-4bit")
Starting generation ...
What is the gravity on Mars and the moon?The gravity on Mars and the Moon is weaker than Earth's due to their smaller masses and sizes.
On Mars, gravity is about 38% of Earth's gravity. Specifically, the gravitational acceleration on Mars is approximately 3.71 meters per second squared (m/s²) compared to Earth's 9.81 m/s².
On the Moon, gravity is about 16.5% of Earth's gravity
------
Prompt: 13 tokens, 12.989868 tokens/s
Generation: 100 tokens, 61.009203 tokens/s, 1.639097s
That takes about 22G to run.
I think this is ready to merge once tests pass.
Actually we may want to merge in the changes from #135 once that merges...
Nope, no conflicts.
Fantastic, thank you!