ValueError: [conv] Expect the input channels in the input and weight array to match
The Vision Transformer examples in the README don't work for me, I am getting this error in both cases :
File ~/Documents/mlx-embeddings/.venv/lib/python3.12/site-packages/mlx/nn/layers/convolution.py:157, in Conv2d.call(self, x) 156 def call(self, x): --> 157 y = mx.conv2d( 158 x, self.weight, self.stride, self.padding, self.dilation, self.groups 159 ) 160 if "bias" in self: 161 y = y + self.bias
ValueError: [conv] Expect the input channels in the input and weight array to match but got shapes - input: (1,384,3,384) and weight: (1152,14,14,3)
File ~/Documents/mlx-embeddings/.venv/lib/python3.12/site-packages/mlx/nn/layers/convolution.py:157, in Conv2d.call(self, x) 156 def call(self, x): --> 157 y = mx.conv2d( 158 x, self.weight, self.stride, self.padding, self.dilation, self.groups 159 ) 160 if "bias" in self: 161 y = y + self.bias
ValueError: [conv] Expect the input channels in the input and weight array to match but got shapes - input: (2,384,3,384) and weight: (1152,14,14,3)
Hey @arogister
I will fix the readme examples later today
For now here is the solution: https://github.com/Blaizzy/mlx-embeddings/issues/18#issuecomment-2777111054