mergekit
mergekit copied to clipboard
mixtral branch: dimention mismatch in `cheap_embed`
Here, the dimention in cheap_embed
is 4-dimentional tensors:
https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L87
However, the gate_vec
receive a 3-dimentional tensor.
https://github.com/cg123/mergekit/blob/d55f654c2e70d3ac4ad6532de96e266aff2de931/mergekit/scripts/mixtral_moe.py#L158-L161