EXC_BAD_ACCESS (code=1, address=0x0) with models with multiple LayerNorm layers
πDescribing the bug
CoreML crashes in [MLNeuralNetworkEngine predictionFromFeatures:options:error:] with EXC_BAD_ACCESS (code=1, address=0x0) when the model contains multiple Conv1D layers each followed by a LayerNorm layer. Without the LayerNorm layers but with the same Conv1D configuration, CoreML runs well without crashing. The model was created using TensorFlow 2.6.2 on Python 3.9.12, and converted with coremltools 5.2.0. I'm uploading this report here because I'm not sure whether this is an issue of coremltools or an issue of CoreML.
Stack Trace
* thread #1, queue = 'com.apple.CoreMLBatchProcessingQueue', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
frame #0: 0x00000001bb52711c libsystem_platform.dylib`_platform_memmove + 76
frame #1: 0x00000001d0589b58 Espresso`EspressoLight::espresso_plan::__copy_inputs(std::__1::shared_ptr<EspressoLight::plan_task_t>, std::__1::shared_ptr<Espresso::abstract_batch> const&, int, std::__1::shared_ptr<Espresso::net>) + 1436
frame #2: 0x00000001d0588e2c Espresso`EspressoLight::espresso_plan::dispatch_task_on_compute_batch(std::__1::shared_ptr<Espresso::abstract_batch> const&, std::__1::shared_ptr<EspressoLight::plan_task_t> const&) + 504
frame #3: 0x00000001d0593a24 Espresso`EspressoLight::espresso_plan::execute_sync() + 412
frame #4: 0x00000001d0599388 Espresso`espresso_plan_execute_sync + 132
frame #5: 0x00000001c316bf40 CoreML`-[MLNeuralNetworkEngine executePlan:error:] + 136
frame #6: 0x00000001c316c5e8 CoreML`-[MLNeuralNetworkEngine evaluateInputs:bufferIndex:options:error:] + 728
frame #7: 0x00000001c316ead4 CoreML`__54-[MLNeuralNetworkEngine evaluateInputs:options:error:]_block_invoke + 44
frame #8: 0x00000001005f63a8 libdispatch.dylib`_dispatch_client_callout + 20
frame #9: 0x000000010060ab94 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 192
frame #10: 0x00000001c316e8f8 CoreML`-[MLNeuralNetworkEngine evaluateInputs:options:error:] + 376
frame #11: 0x00000001c3163568 CoreML`__62-[MLNeuralNetworkEngine predictionFromFeatures:options:error:]_block_invoke + 128
frame #12: 0x00000001005f63a8 libdispatch.dylib`_dispatch_client_callout + 20
frame #13: 0x000000010060ab94 libdispatch.dylib`_dispatch_lane_barrier_sync_invoke_and_complete + 192
frame #14: 0x00000001c31633dc CoreML`-[MLNeuralNetworkEngine predictionFromFeatures:options:error:] + 436
* frame #15: 0x000000010000d958 MyCoreMLCmdApp`MyModel.prediction(input=0x00000001010af390, options=0x00000001010ac080, self=0x00000001010cb250) at MyModel.swift:224:37
frame #16: 0x000000010000d878 MyCoreMLCmdApp`MyModel.prediction(input=0x00000001010af390, self=0x00000001010cb250) at MyModel.swift:209:25
frame #17: 0x00000001000045c8 MyCoreMLCmdApp`main at main.swift:10:25
frame #18: 0x000000010004108c dyld`start + 520
To Reproduce
# Python 3.9.12
# pip install tensorflow==2.6.2 coremltools==5.2.0 protobuf<=3.20
import coremltools as ct
import tensorflow as tf
import os
def make_model(conv_layer_definitions: list[int]) -> tf.keras.Model:
input = tf.keras.layers.Input(shape=(4096, 2))
output = input
for conv_layer_definition in conv_layer_definitions:
output = tf.keras.layers.Conv1D(conv_layer_definition, 8, 2, "same")(output)
# Uncomment this line to reproduce the problem
# output = tf.keras.layers.LayerNormalization()(output)
model = tf.keras.Model(inputs=input, outputs=output)
model.compile(optimizer="SGD", loss="binary_crossentropy")
model.summary()
return model
conv_layer_definitions_not_working = [
[32, 32, 64, 64, 128, 128, 256],
[32, 32, 64, 64, 64, 64],
[32, 32, 64, 64, 64],
[128, 128, 128, 128],
[256, 256, 256],
[512, 512, 512],
# Edited: This works
# [4096],
]
conv_layer_definitions_works_well = [
[32, 32, 64, 64],
[64, 64, 64, 64],
[128, 128, 128],
[256, 256],
[512, 512],
[4096],
]
# replace this with conv_layer_definitions_not_working[...] and the Swift program below will crash as reported above.
conv_layer_definitions = conv_layer_definitions_works_well[-1]
os.system("rm -rf MyModel.mlpackage")
converted_model: ct.models.MLModel = ct.convert(
make_model(conv_layer_definitions), convert_to="mlprogram", source="tensorflow"
)
converted_model.save("MyModel.mlpackage")
import CoreML
let model = try! MyModel()
let inputData = [Float](repeating: 0.1, count: 4096 * 2)
let inputArray = MLShapedArray(scalars: inputData, shape: [1, 4096, 2])
let input = MyModelInput(input_1: inputArray)
// Crashes here
let output = try! model.prediction(input: input)
print("\(output.IdentityShapedArray.shape)")
Model training environment:
- coremltools version: 5.2.0
- OS: Ubuntu 18.04.6 LTS
- Python version: 3.9.12
- TensorFlow version: 2.6.2
Deployment target:
- Xcode version: 13.4.1 (13F100)
- OS: macOS 12.5
Seems like LayerNorm makes pretty verbose MIL code... maybe the malloc()-like function used in CoreML returns nullptr when the allocation fails?
I can't reproduce this issue using our latest beta release.
In order to get it to fail, I'm suppose to replace this line:
conv_layer_definitions = conv_layer_definitions_works_well[-1]
with this:
conv_layer_definitions = conv_layer_definitions_not_working[-1]
Is that right?
If that is correct, then please try installing our latest beta release (via pip install coremltools --pre) and see if that fixes the issue.
Also can you successfully get predictions from your converted model in Python?
Seems like [4096] works (it didn't work when I initially tried), but [32, 32, 64, 64, 128, 128, 256] definitely does not work. Please try with conv_layer_definitions_not_working[0]. TensorFlow successfully gets predictions from my model, but CoreML crashes as shown below. I'll try with 6.0 as you said.
Changes in the Python script:
...
conv_layer_definitions = conv_layer_definitions_not_working[0]
...
converted_model: ct.models.MLModel = ct.convert(
make_model(conv_layer_definitions), convert_to="mlprogram", source="tensorflow"
)
print("predicting...")
prediction_result = converted_model.predict({
"input_1": np.zeros((1, 4096, 2), np.float32)
})
print(prediction_result["Identity"].shape)
converted_model.save("MyModel.mlpackage")
Execution result:
paxbun@PAXBUN-MAC conversion % cat main.py
...
conv_layer_definitions = conv_layer_definitions_not_working[0]
...
converted_model: ct.models.MLModel = ct.convert(
make_model(conv_layer_definitions), convert_to="mlprogram", source="tensorflow"
)
print("predicting...")
prediction_result = converted_model.predict({
"input_1": np.zeros((1, 4096, 2), np.float32)
})
print(prediction_result["Identity"].shape)
converted_model.save("MyModel.mlpackage")
paxbun@PAXBUN-MAC conversion % python main.py
...
Running TensorFlow Graph Passes: 100%|ββββββββββββββββββββββ| 6/6 [00:00<00:00, 32.58 passes/s]
Converting Frontend ==> MIL Ops: 100%|ββββββββββββββββββββ| 247/247 [00:00<00:00, 953.60 ops/s]
Running MIL Common passes: 100%|βββββββββββββββββββββββββ| 34/34 [00:00<00:00, 206.78 passes/s]
Running MIL FP16ComputePrecision pass: 100%|ββββββββββββββββ| 1/1 [00:00<00:00, 3.54 passes/s]
Running MIL Clean up passes: 100%|ββββββββββββββββββββββββββ| 9/9 [00:00<00:00, 20.85 passes/s]
predicting...
zsh: segmentation fault python main.py
/opt/homebrew/Cellar/[email protected]/3.9.13_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
paxbun@PAXBUN-MAC conversion % echo $?
130
coremltools==6.0b2 does not work as well:
(base) paxbun@PAXBUN-MAC conversion % conda create -n foo python=3.9
...
(base) paxbun@PAXBUN-MAC conversion % conda activate foo
(foo) paxbun@PAXBUN-MAC conversion % pip install tensorflow-macos==2.8.0 coremltools==6.0b2 "protobuf<=3.20" --pre
...
Installing collected packages: tf-estimator-nightly, termcolor, tensorboard-plugin-wit, mpmath, libclang, keras, flatbuffers, zipp, wrapt, urllib3, typing-extensions, tqdm, tensorboard-data-server, sympy, six, pyparsing, pyasn1, protobuf, oauthlib, numpy, MarkupSafe, idna, gast, charset-normalizer, cachetools, absl-py, werkzeug, rsa, requests, pyasn1-modules, packaging, opt-einsum, keras-preprocessing, importlib-metadata, h5py, grpcio, google-pasta, astunparse, requests-oauthlib, markdown, google-auth, coremltools, google-auth-oauthlib, tensorboard, tensorflow-macos
Successfully installed MarkupSafe-2.1.1 absl-py-1.2.0 astunparse-1.6.3 cachetools-5.2.0 charset-normalizer-2.1.0 coremltools-6.0b2 flatbuffers-2.0 gast-0.5.3 google-auth-2.10.0 google-auth-oauthlib-0.4.6 google-pasta-0.2.0 grpcio-1.48.0rc1 h5py-3.7.0 idna-3.3 importlib-metadata-4.12.0 keras-2.8.0 keras-preprocessing-1.1.2 libclang-14.0.6 markdown-3.4.1 mpmath-1.2.1 numpy-1.23.1 oauthlib-3.2.0 opt-einsum-3.3.0 packaging-21.3 protobuf-3.20.0 pyasn1-0.5.0rc1 pyasn1-modules-0.3.0rc1 pyparsing-3.0.9 requests-2.28.1 requests-oauthlib-1.3.1 rsa-4.9 six-1.16.0 sympy-1.10.1 tensorboard-2.8.0 tensorboard-data-server-0.6.1 tensorboard-plugin-wit-1.8.1 tensorflow-macos-2.8.0 termcolor-1.1.0 tf-estimator-nightly-2.8.0.dev2021122109 tqdm-4.64.0 typing-extensions-4.3.0 urllib3-1.26.11 werkzeug-2.2.2 wrapt-1.14.1 zipp-3.8.1
(foo) paxbun@PAXBUN-MAC conversion % python main.py
...
predicting...
zsh: segmentation fault python main.py
(foo) paxbun@PAXBUN-MAC conversion % /Users/paxbun/anaconda3/envs/foo/lib/python3.9/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d '
(foo) paxbun@PAXBUN-MAC conversion % echo $?
130
I still can not reproduce this issue. On macOS 12.3, I can get predictions from your Core ML model just fine.
If this was worked in macOS 12.3 but stopped work in 12.5, then it's an issue with the Core ML Framework.
Please report this issue here: https://developer.apple.com/bug-reporting/.