coremltools Result of Flexible-input-shape Model is NAN

🐞Describing the bug

When I using EnumeratedShape or RangeDim to generate a flexible-input-shape model to inference, the result is all nan.

Stack Trace

/opt/homebrew/anaconda3/envs/bce/bin/python /Users/jinmuchuan/projects/BCEmbedding/model.py 
torch.int32
When both 'convert_to' and 'minimum_deployment_target' not specified, 'convert_to' is set to "mlprogram" and 'minimum_deployment_targer' is set to ct.target.iOS15 (which is same as ct.target.macOS12). Note: the model will not run on systems older than iOS15/macOS12/watchOS8/tvOS15. In order to make your model run on older system, please set the 'minimum_deployment_target' to iOS14/iOS13. Details please see the link: https://coremltools.readme.io/docs/unified-conversion-api#target-conversion-formats
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops:   0%|          | 0/672 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|█████████▉| 670/672 [00:00<00:00, 5151.90 ops/s]
Running MIL frontend_pytorch pipeline: 100%|██████████| 5/5 [00:00<00:00, 458.38 passes/s]
Running MIL default pipeline:   0%|          | 0/71 [00:00<?, ? passes/s]/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '1617', of the source model, has been renamed to 'var_1617' in the Core ML model.
  warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline:  59%|█████▉    | 42/71 [00:00<00:00, 135.52 passes/s]/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
  return input_var.val.astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|██████████| 71/71 [00:13<00:00,  5.44 passes/s]
Running MIL backend_mlprogram pipeline: 100%|██████████| 12/12 [00:00<00:00, 480.47 passes/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'output': array([[[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]],

       [[nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan],
        [nan, nan, nan, ..., nan, nan, nan]]], dtype=float32), 'var_1617': array([[nan, nan, nan, ..., nan, nan, nan],
       [nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}

进程已结束，退出代码为 0

To Reproduce

import numpy as np
import torch
import coremltools as ct
from transformers import AutoModel, AutoTokenizer

sentences = ['sentence_0', 'sentence_1']

tokenizer = AutoTokenizer.from_pretrained('maidalun1020/bce-embedding-base_v1')
model = AutoModel.from_pretrained('maidalun1020/bce-embedding-base_v1', return_dict=False)

device = 'cpu'  # if no GPU, set "cpu"
model.to(device)
example_input = torch.randint(0, 10, size=(1, 128)).type(torch.int32)
print(example_input.dtype)

traced_script_module = torch.jit.trace(model.eval(), (example_input, example_input))


input_shape = ct.Shape(shape=(ct.RangeDim(lower_bound=1, upper_bound=8),
                              ct.RangeDim(lower_bound=1, upper_bound=512)))

mlmodel = ct.convert(traced_script_module,
                     inputs=[ct.TensorType(shape=input_shape, dtype=np.int32, name="input_ids"),
                             ct.TensorType(shape=input_shape, dtype=np.int32, name="attention_mask")],
                     outputs=[ct.TensorType(dtype=np.float32, name="output"),
                              ct.TensorType(dtype=np.float32, name="1617")]
                     )

mlmodel.save('embed.mlpackage')

inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="np")
inputs_on_device = {k: v.astype(np.int32) for k, v in inputs.items()}

out_dict = mlmodel.predict(inputs_on_device)
print(out_dict)

System environment (please complete the following information):

coremltools version:7.1
OS (e.g. MacOS version or Linux type):MacOS 13.3.1 (a)
Any other relevant version information (e.g. PyTorch or TensorFlow version): torch==2.1.0

Mar 12 '24 11:03 jmcc113

Loading an untrusted PyTorch model is a security risk. So I'm unable to reproduce your results. It would be great if you could give us a minimal example (i.e. one which doesn't require loading an external model).

Does the output match if you convert with fixed shapes?

Mar 12 '24 21:03 TobyRoseman

Loading an untrusted PyTorch model is a security risk. So I'm unable to reproduce your results. It would be great if you could give us a minimal example (i.e. one which doesn't require loading an external model).

Does the output match if you convert with fixed shapes?

This model is from Hugging Face. I'm not sure which layer causes this bug, so it's difficult for me to construct a minimal example. But the output of fixed-shape-model is correct.

Mar 13 '24 02:03 jmcc113

I'm not sure which layer causes this bug, so it's difficult for me to construct a minimal example.

I completely understand. Unfortunately, without a minimal example, it's difficult for me to help you.

Since the fixed shape works, the issue is almost certainly related to flexible shape. For debugging purposes, there's a few more things you could try.

1 - Verify that the traced PyTorch model still works for shapes within the range of the flexible shape but different than the shapes it was traced on.

2 - See if the model converts correct with fixed input_ids shape, but a flexible attention_mask shape.

3 - See if the model converts correct with fixed attention_mask shape, but a flexible input_ids shape.

Mar 13 '24 19:03 TobyRoseman

1 - Verify that the traced PyTorch model still works for shapes within the range of the flexible shape but different than the shapes it was traced on.

2 - See if the model converts correct with fixed input_ids shape, but a flexible attention_mask shape.

3 - See if the model converts correct with fixed attention_mask shape, but a flexible input_ids shape.

The traced model works well for shapes different than the shapes it was traced on.
When running the model which converts with fixed input_ids shape, but a flexible attention_mask shape, I get an error:

Traceback (most recent call last):
  File "/Users/jinmuchuan/projects/BCEmbedding/model.py", line 48, in <module>
    out_dict = mlmodel.predict(inputs_on_device)
  File "/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/models/model.py", line 596, in predict
    return MLModel._get_predictions(self.__proxy__, verify_and_convert_input_dict, data)
  File "/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/models/model.py", line 648, in _get_predictions
    return proxy.predict(data)
RuntimeError: {
    NSLocalizedDescription = "Failed to build the model execution plan using a model architecture file '/private/var/folders/sn/xnnh_7q94y9fx18c0g26rt716qppp1/T/tmpf0e8pl49.mlmodelc/model.mil' with error code: -7.";
}

When running the model which converts with fixed attention_mask shape, but a flexible input_ids shape, I get NAN.

Mar 15 '24 07:03 jmcc113