coremltools Softmax with rank 3 has wrong semantics and wrong MIL->NNv1 conversion.

🐞Describing the bug

There are essential two bugs, the first perpetuating the second:

The NeuralNetwork.proto states that softmax layers supports rank >= 3 and operate on axis = -3. Experimentally this turns out to be false when rank = 3, since then it actually operates on axis = -1. However, the documentation makes no mention of this, so the first bug appears to be a documentation issue:

/**
 * Softmax Normalization Layer
 *
 * A layer that performs softmax normalization.
 * Normalization is applied along axis = -3 or N-3 (where N is the rank of the input)
 * For softmax layer that can operate on any axis, see SoftmaxNDLayer.
 *
 *
 * .. code::
 *
 *      y = SoftmaxLayer(x)
 *
 * Requires 1 input and produces 1 output.
 *
 * Input
 *     Must be a blob with rank >= 3.
 * Output
 *     A blob with the same shape as the input.
 *
 * This layer is described by the following formula:
 *
 * .. math::
 *     x_i \leftarrow \dfrac{e^{x_i}}{\sum_i{e^{x_i}}}
 */

The MIL -> NNv1 converter seems to assume the documentation to be gospel, and propagates this issue to the converter:

@register_mil_to_nn_mapping
def softmax(const_context, builder, op):
    rank = op.x.rank
    if op.axis.val == -3 or op.axis.val > 0 and op.axis.val == rank - 3:
        builder.add_softmax(
            name=op.name, input_name=op.x.name, output_name=op.outputs[0].name,
        )
    else:
        builder.add_softmax_nd(
            name=op.name,
            input_name=op.x.name,
            output_name=op.outputs[0].name,
            axis=op.axis.val,
        )

To Reproduce

Run this script as proof that softmax indeed operates on axis = -1 for rank 3 input:

import coremltools as ct
from coremltools.converters.mil import Builder as mb
import numpy as np

input_shape = (2, 1, 1)
# input_shape = (1, 1, 2)   <-- confirms softmax operates on axis=-1

@mb.program(input_specs=[mb.TensorSpec(shape=input_shape)])
def prog(x):                                               
    x = mb.softmax(                                 
        x=x,
        axis=-3,
        name="y",
    )                                                      
    return x

mlmodel = ct.convert(prog)

print("== MIL program ==")
print(prog)

print("== MLModel proto ==")
print(mlmodel.get_spec())

y = mlmodel.predict({"x": np.array([0.5, 1.5]).reshape(input_shape)})["y"]
print("== Output ==")
print(y)

The output is as following:

== MIL program == 
main[CoreML3](%x: (2, 1, 1, fp32)(Tensor)) {                           
  block0() {
    %y: (2, 1, 1, fp32)(Tensor) = softmax(x=%x, axis=-3, name="y")     
  } -> (%y)    
}

== MLModel proto ==                                                                                                                            
specificationVersion: 4                                                
description {
  input {                          
    name: "x"                  
    type {                                                             
      multiArrayType {                                                                                                                         
        shape: 2                                                       
        shape: 1                                                                                                                               
        shape: 1                                                                                                                               
        dataType: FLOAT32                                                                                                                      
      }           
    }                                                                  
  }         
  output {                                                             
    name: "y"  
    type {
      multiArrayType {             
        dataType: FLOAT32
      }                
    }        
  }
  metadata {
    userDefined {
      key: "com.github.apple.coremltools.source"
      value: "milinternal"
    }
    userDefined {
      key: "com.github.apple.coremltools.version"
      value: "6.0"
    }
  }
}
neuralNetwork {
  layers {
    name: "y"
    input: "x"
    output: "y"
    softmax {
    }
  }
  arrayInputShapeMapping: EXACT_ARRAY_MAPPING
  imageInputShapeMapping: RANK4_IMAGE_MAPPING
}

== Output ==
[[[1.]]

 [[1.]]]

If the softmax was really over axis = -3 = 0, then the results should sum to 1, while it sums to 2. If you try input_shape = (1, 1, 2) instead, it is clear that the softmax is happening along axis = -1 = 2. It is also easy to confirm that for rank >= 4, it does use axis = -3.

This looks like a bug, right? Is there anything I'm overlooking?

System environment (please complete the following information):

coremltools version: coremltools==6.0 (but looks like it would repro from main)
macOS 12.6

Dec 14 '22 20:12 gustavla

Looks like this is also an issue for MIL -> mlprogram:

from scipy import special
import coremltools as ct
from coremltools.converters.mil import Builder as mb
import numpy as np


input_shape = (2, 1, 1)
axis = -3


@mb.program(input_specs=[mb.TensorSpec(shape=input_shape)])
def prog(x):                                               
    x = mb.softmax(                                 
        x=x,
        axis=axis,
        name="y",
    )                                                      
    return x


x = np.array([0.5, 1.5]).reshape(input_shape)

print(special.softmax(x, axis=axis))

mlmodel = ct.convert(prog, convert_to="mlprogram")
print(mlmodel.predict({"x": x})["y"])

prints:

[[[0.26894142]]

 [[0.73105858]]]

[[[1.]]

 [[1.]]]

Dec 14 '22 22:12 TobyRoseman

Does this cause #1705 and #1749? If so, maybe worth some attention from @TobyRoseman and @tonybove-apple? Thanks in advance.

Sep 21 '23 11:09 SichangHe