Result of Flexible-input-shape Model is NAN
πDescribing the bug
When I using EnumeratedShape or RangeDim to generate a flexible-input-shape model to inference, the result is all nan.
Stack Trace
/opt/homebrew/anaconda3/envs/bce/bin/python /Users/jinmuchuan/projects/BCEmbedding/model.py
torch.int32
When both 'convert_to' and 'minimum_deployment_target' not specified, 'convert_to' is set to "mlprogram" and 'minimum_deployment_targer' is set to ct.target.iOS15 (which is same as ct.target.macOS12). Note: the model will not run on systems older than iOS15/macOS12/watchOS8/tvOS15. In order to make your model run on older system, please set the 'minimum_deployment_target' to iOS14/iOS13. Details please see the link: https://coremltools.readme.io/docs/unified-conversion-api#target-conversion-formats
Tuple detected at graph output. This will be flattened in the converted model.
Converting PyTorch Frontend ==> MIL Ops: 0%| | 0/672 [00:00<?, ? ops/s]Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Core ML embedding (gather) layer does not support any inputs besides the weights and indices. Those given will be ignored.
Converting PyTorch Frontend ==> MIL Ops: 100%|ββββββββββ| 670/672 [00:00<00:00, 5151.90 ops/s]
Running MIL frontend_pytorch pipeline: 100%|ββββββββββ| 5/5 [00:00<00:00, 458.38 passes/s]
Running MIL default pipeline: 0%| | 0/71 [00:00<?, ? passes/s]/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/converters/mil/mil/passes/defs/preprocess.py:267: UserWarning: Output, '1617', of the source model, has been renamed to 'var_1617' in the Core ML model.
warnings.warn(msg.format(var.name, new_name))
Running MIL default pipeline: 59%|ββββββ | 42/71 [00:00<00:00, 135.52 passes/s]/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/converters/mil/mil/ops/defs/iOS15/elementwise_unary.py:894: RuntimeWarning: overflow encountered in cast
return input_var.val.astype(dtype=string_to_nptype(dtype_val))
Running MIL default pipeline: 100%|ββββββββββ| 71/71 [00:13<00:00, 5.44 passes/s]
Running MIL backend_mlprogram pipeline: 100%|ββββββββββ| 12/12 [00:00<00:00, 480.47 passes/s]
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
{'output': array([[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]],
[[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]]], dtype=float32), 'var_1617': array([[nan, nan, nan, ..., nan, nan, nan],
[nan, nan, nan, ..., nan, nan, nan]], dtype=float32)}
θΏη¨ε·²η»ζοΌιεΊδ»£η δΈΊ 0
To Reproduce
import numpy as np
import torch
import coremltools as ct
from transformers import AutoModel, AutoTokenizer
sentences = ['sentence_0', 'sentence_1']
tokenizer = AutoTokenizer.from_pretrained('maidalun1020/bce-embedding-base_v1')
model = AutoModel.from_pretrained('maidalun1020/bce-embedding-base_v1', return_dict=False)
device = 'cpu' # if no GPU, set "cpu"
model.to(device)
example_input = torch.randint(0, 10, size=(1, 128)).type(torch.int32)
print(example_input.dtype)
traced_script_module = torch.jit.trace(model.eval(), (example_input, example_input))
input_shape = ct.Shape(shape=(ct.RangeDim(lower_bound=1, upper_bound=8),
ct.RangeDim(lower_bound=1, upper_bound=512)))
mlmodel = ct.convert(traced_script_module,
inputs=[ct.TensorType(shape=input_shape, dtype=np.int32, name="input_ids"),
ct.TensorType(shape=input_shape, dtype=np.int32, name="attention_mask")],
outputs=[ct.TensorType(dtype=np.float32, name="output"),
ct.TensorType(dtype=np.float32, name="1617")]
)
mlmodel.save('embed.mlpackage')
inputs = tokenizer(sentences, padding=True, truncation=True, max_length=512, return_tensors="np")
inputs_on_device = {k: v.astype(np.int32) for k, v in inputs.items()}
out_dict = mlmodel.predict(inputs_on_device)
print(out_dict)
System environment (please complete the following information):
- coremltools version:7.1
- OS (e.g. MacOS version or Linux type):MacOS 13.3.1 (a)
- Any other relevant version information (e.g. PyTorch or TensorFlow version): torch==2.1.0
Loading an untrusted PyTorch model is a security risk. So I'm unable to reproduce your results. It would be great if you could give us a minimal example (i.e. one which doesn't require loading an external model).
Does the output match if you convert with fixed shapes?
Loading an untrusted PyTorch model is a security risk. So I'm unable to reproduce your results. It would be great if you could give us a minimal example (i.e. one which doesn't require loading an external model).
Does the output match if you convert with fixed shapes?
This model is from Hugging Face. I'm not sure which layer causes this bug, so it's difficult for me to construct a minimal example. But the output of fixed-shape-model is correct.
I'm not sure which layer causes this bug, so it's difficult for me to construct a minimal example.
I completely understand. Unfortunately, without a minimal example, it's difficult for me to help you.
Since the fixed shape works, the issue is almost certainly related to flexible shape. For debugging purposes, there's a few more things you could try.
1 - Verify that the traced PyTorch model still works for shapes within the range of the flexible shape but different than the shapes it was traced on.
2 - See if the model converts correct with fixed input_ids shape, but a flexible attention_mask shape.
3 - See if the model converts correct with fixed attention_mask shape, but a flexible input_ids shape.
1 - Verify that the traced PyTorch model still works for shapes within the range of the flexible shape but different than the shapes it was traced on.
2 - See if the model converts correct with fixed
input_idsshape, but a flexibleattention_maskshape.3 - See if the model converts correct with fixed
attention_maskshape, but a flexibleinput_idsshape.
- The traced model works well for shapes different than the shapes it was traced on.
- When running the model which converts with fixed
input_idsshape, but a flexibleattention_maskshape, I get an error:
Traceback (most recent call last):
File "/Users/jinmuchuan/projects/BCEmbedding/model.py", line 48, in <module>
out_dict = mlmodel.predict(inputs_on_device)
File "/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/models/model.py", line 596, in predict
return MLModel._get_predictions(self.__proxy__, verify_and_convert_input_dict, data)
File "/opt/homebrew/anaconda3/envs/bce/lib/python3.10/site-packages/coremltools/models/model.py", line 648, in _get_predictions
return proxy.predict(data)
RuntimeError: {
NSLocalizedDescription = "Failed to build the model execution plan using a model architecture file '/private/var/folders/sn/xnnh_7q94y9fx18c0g26rt716qppp1/T/tmpf0e8pl49.mlmodelc/model.mil' with error code: -7.";
}
- When running the model which converts with fixed
attention_maskshape, but a flexibleinput_idsshape, I get NAN.