Softmax with rank 3 has wrong semantics and wrong MIL->NNv1 conversion.
🐞Describing the bug
There are essential two bugs, the first perpetuating the second:
- The NeuralNetwork.proto states that softmax layers supports rank >= 3 and operate on axis = -3. Experimentally this turns out to be false when rank = 3, since then it actually operates on axis = -1. However, the documentation makes no mention of this, so the first bug appears to be a documentation issue:
/**
* Softmax Normalization Layer
*
* A layer that performs softmax normalization.
* Normalization is applied along axis = -3 or N-3 (where N is the rank of the input)
* For softmax layer that can operate on any axis, see SoftmaxNDLayer.
*
*
* .. code::
*
* y = SoftmaxLayer(x)
*
* Requires 1 input and produces 1 output.
*
* Input
* Must be a blob with rank >= 3.
* Output
* A blob with the same shape as the input.
*
* This layer is described by the following formula:
*
* .. math::
* x_i \leftarrow \dfrac{e^{x_i}}{\sum_i{e^{x_i}}}
*/
- The MIL -> NNv1 converter seems to assume the documentation to be gospel, and propagates this issue to the converter:
@register_mil_to_nn_mapping
def softmax(const_context, builder, op):
rank = op.x.rank
if op.axis.val == -3 or op.axis.val > 0 and op.axis.val == rank - 3:
builder.add_softmax(
name=op.name, input_name=op.x.name, output_name=op.outputs[0].name,
)
else:
builder.add_softmax_nd(
name=op.name,
input_name=op.x.name,
output_name=op.outputs[0].name,
axis=op.axis.val,
)
To Reproduce
Run this script as proof that softmax indeed operates on axis = -1 for rank 3 input:
import coremltools as ct
from coremltools.converters.mil import Builder as mb
import numpy as np
input_shape = (2, 1, 1)
# input_shape = (1, 1, 2) <-- confirms softmax operates on axis=-1
@mb.program(input_specs=[mb.TensorSpec(shape=input_shape)])
def prog(x):
x = mb.softmax(
x=x,
axis=-3,
name="y",
)
return x
mlmodel = ct.convert(prog)
print("== MIL program ==")
print(prog)
print("== MLModel proto ==")
print(mlmodel.get_spec())
y = mlmodel.predict({"x": np.array([0.5, 1.5]).reshape(input_shape)})["y"]
print("== Output ==")
print(y)
The output is as following:
== MIL program ==
main[CoreML3](%x: (2, 1, 1, fp32)(Tensor)) {
block0() {
%y: (2, 1, 1, fp32)(Tensor) = softmax(x=%x, axis=-3, name="y")
} -> (%y)
}
== MLModel proto ==
specificationVersion: 4
description {
input {
name: "x"
type {
multiArrayType {
shape: 2
shape: 1
shape: 1
dataType: FLOAT32
}
}
}
output {
name: "y"
type {
multiArrayType {
dataType: FLOAT32
}
}
}
metadata {
userDefined {
key: "com.github.apple.coremltools.source"
value: "milinternal"
}
userDefined {
key: "com.github.apple.coremltools.version"
value: "6.0"
}
}
}
neuralNetwork {
layers {
name: "y"
input: "x"
output: "y"
softmax {
}
}
arrayInputShapeMapping: EXACT_ARRAY_MAPPING
imageInputShapeMapping: RANK4_IMAGE_MAPPING
}
== Output ==
[[[1.]]
[[1.]]]
If the softmax was really over axis = -3 = 0, then the results should sum to 1, while it sums to 2. If you try input_shape = (1, 1, 2) instead, it is clear that the softmax is happening along axis = -1 = 2. It is also easy to confirm that for rank >= 4, it does use axis = -3.
This looks like a bug, right? Is there anything I'm overlooking?
System environment (please complete the following information):
- coremltools version:
coremltools==6.0(but looks like it would repro from main) - macOS 12.6
Looks like this is also an issue for MIL -> mlprogram:
from scipy import special
import coremltools as ct
from coremltools.converters.mil import Builder as mb
import numpy as np
input_shape = (2, 1, 1)
axis = -3
@mb.program(input_specs=[mb.TensorSpec(shape=input_shape)])
def prog(x):
x = mb.softmax(
x=x,
axis=axis,
name="y",
)
return x
x = np.array([0.5, 1.5]).reshape(input_shape)
print(special.softmax(x, axis=axis))
mlmodel = ct.convert(prog, convert_to="mlprogram")
print(mlmodel.predict({"x": x})["y"])
prints:
[[[0.26894142]]
[[0.73105858]]]
[[[1.]]
[[1.]]]
Does this cause #1705 and #1749? If so, maybe worth some attention from @TobyRoseman and @tonybove-apple? Thanks in advance.