mlx-swift-examples icon indicating copy to clipboard operation
mlx-swift-examples copied to clipboard

Add Phi 3.5 MoE

Open DePasqualeOrg opened this issue 1 year ago • 5 comments

This is my attempt to port the Phi 3.5 MoE model from the Python implementation. Unfortunately I can't test it myself, since my MacBook doesn't have enough RAM. I marked two places in PhiMoE.swift with comments starting with !! which need to be checked. You can test this with ModelConfiguration.phi3_5MoE. Go ahead and make any necessary changes if you'd like, since I won't be able to run this myself.

DePasqualeOrg avatar Aug 31 '24 22:08 DePasqualeOrg

I will give it a try on Tuesday!

davidkoski avatar Sep 01 '24 04:09 davidkoski

It looks like there is a problem loading the weights:

Error: Mismatched parameter weight shape. Actual [16, 6400, 512], expected [16, 6400, 4096]

this is on SwitchLinear (maybe this error needs some more context).

SwitchLinear(bias=nil, inputDims=4096, numExperts=16, outputDims=6400)

And actually the parameters has a biases which is not expected here:

(lldb) po parameters.mapValues { $0.shape }
▿ [
  biases: [16, 6400, 64],
  scales: [16, 6400, 64],
  weight: [16, 6400, 512]
]

I need to look into this further

davidkoski avatar Sep 03 '24 22:09 davidkoski

OK, I think SwitchLinear isn't quite right -- it is missing the bias (at the very least):

  • https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/models/switch_layers.py#L87

and I think the dimension mismatch comes down to quantization -- the SwitchLinear isn't being replaced by SwitchLinearQuantized because it doesn't implement the protocol:

  • https://github.com/ml-explore/mlx-swift/blob/main/Source/MLXNN/Quantized.swift#L14

and I think QuantizedSwitchLinear will need to be a subtype of SwitchLinear for the replacement to happen -- this @ModuleInfo(key: "gate_proj") var gateProj: SwitchLinear requires that the type be SwitchLinear or a subtype. Linear/QuantizedLinear are modeled the same way.

davidkoski avatar Sep 04 '24 15:09 davidkoski

Thanks for that feedback. I've tried to make those changes, although I don't know how helpful this will be, since I unfortunately can't test it myself. If it's too much trouble and you'd prefer to focus on other priorities, don't worry about it.

DePasqualeOrg avatar Sep 04 '24 19:09 DePasqualeOrg

OK, I will see if I can test/finish this -- it may be a few days before I get a chance.

davidkoski avatar Sep 05 '24 16:09 davidkoski

Sorry for the long delay here -- was busy busy busy!

I fixed up the quantized switch layers -- there were some issues with how they got initialized.

The couple of lines you had commented as maybe issues were exactly the spots with problems. With a little testing everything is set there.

Model loaded -> id("mlx-community/Phi-3.5-MoE-instruct-4bit")
Starting generation ...
What is the gravity on Mars and the moon?The gravity on Mars and the Moon is weaker than Earth's due to their smaller masses and sizes.

On Mars, gravity is about 38% of Earth's gravity. Specifically, the gravitational acceleration on Mars is approximately 3.71 meters per second squared (m/s²) compared to Earth's 9.81 m/s².

On the Moon, gravity is about 16.5% of Earth's gravity
------
Prompt:     13 tokens, 12.989868 tokens/s
Generation: 100 tokens, 61.009203 tokens/s, 1.639097s

That takes about 22G to run.

I think this is ready to merge once tests pass.

davidkoski avatar Oct 22 '24 17:10 davidkoski

Actually we may want to merge in the changes from #135 once that merges...

Nope, no conflicts.

davidkoski avatar Oct 22 '24 17:10 davidkoski

Fantastic, thank you!

DePasqualeOrg avatar Oct 22 '24 17:10 DePasqualeOrg