coremltools Conversion of ESPNetv2 semantic segmentation model from PyTorch to CoreML format using coremltools, taking very long

🐞Describing the bug

Greetings, I have been trying to convert the ESPNetv2 model (which is a semantic segmentation model) available in PyTorch, to CoreML format using coremltools. However, the conversion takes forever, and unless I use a high-end GPU (like a 3090), it basically freezes to the point where I have no idea if the conversion is even running.

This is not happening with other models such as MobileNet, or BiSeNetv2 (another semantic segmentation model implemented in PyTorch).

Based on the logs, and the code I have, may I get some insights on why this may be happening?

Stack Trace

Currently, as you can see, I am getting several TraceWarnings, that seem unavoidable if I want to run the model in a safe manner.

Loading image: ./data/test.jpg
2025-05-02 04:12:49 - WARNING - Training from scratch!!
/Users/himanshu/Desktop/Taskar Center for Accessible Technology/TCAT Python/ESPNetv2_CoreML_Conversion/coreml_conversion.py:47: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  self.model.load_state_dict(torch.load(args.weight_path, map_location=torch.device('cpu')), strict=False)
/Users/himanshu/Desktop/Taskar Center for Accessible Technology/TCAT Python/ESPNetv2_CoreML_Conversion/nn_layers/eesp.py:134: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if w2 == w1:
/Users/himanshu/Desktop/Taskar Center for Accessible Technology/TCAT Python/ESPNetv2_CoreML_Conversion/nn_layers/eesp.py:84: TracerWarning: Converting a tensor to a Python boolean might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  if expanded.size() == input.size():
/Users/himanshu/Desktop/Taskar Center for Accessible Technology/TCAT Python/ESPNetv2_CoreML_Conversion/nn_layers/efficient_pyramid_pool.py:36: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  h_s = int(math.ceil(height * self.scales[I]))
/Users/himanshu/Desktop/Taskar Center for Accessible Technology/TCAT Python/ESPNetv2_CoreML_Conversion/nn_layers/efficient_pyramid_pool.py:37: TracerWarning: Converting a tensor to a Python float might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  w_s = int(math.ceil(width * self.scales[I]))
When both 'convert_to' and 'minimum_deployment_target' not specified, 'convert_to' is set to "mlprogram" and 'minimum_deployment_target' is set to ct.target.iOS15 (which is same as ct.target.macOS12). Note: the model will not run on systems older than iOS15/macOS12/watchOS8/tvOS15. In order to make your model run on older system, please set the 'minimum_deployment_target' to iOS14/iOS13. Details please see the link: https://apple.github.io/coremltools/docs-guides/source/target-conversion-formats.html
Converting PyTorch Frontend ==> MIL Ops:  86%|██████████████████████████▋    | 931/1079 [00:30<00:04, 36.94 ops/s]

To Reproduce

It is pretty difficult to add all the code into one discernible snippet, hence I created a minimal repository that contains all the required context: https://github.com/himanshunaidu/ESPNetv2_CoreML_Conversion

Please run the coreml_conversion.py as given in ReadMe or:

python coreml_conversion.py \
    --weight-path ./weights/espnetv2_s_2.0_city_1024x512.pth \
    --im-size 1024 512 \
    --s 2.0 \
    --outpath model_zoo/ \
    --img-path ./data/test.jpg \
    --dataset city

System environment (please complete the following information):

coremltools version: 8.3.0
OS (e.g. MacOS version or Linux type): macOS Sonoma 14.6 (23G80)
Any other relevant version information (e.g. PyTorch or TensorFlow version): Environment given in the repository (espnetv2_coreml_conversion.yml)

May 02 '25 11:05 himanshunaidu

Using a high end GPU shouldn't matter. Conversion should be running entirely on the CPU.

Based on the output you shared, it looks like the slow down is happening during actual conversion. Not during post conversion optimizations (which has been the source of slow downs in the past).

How long is conversion actually taking? Ten minutes? Over an hour? Over two hours?

The coremltools version is not reported. Please try with our latest version (8.3.0) that we just released.

May 02 '25 16:05 TobyRoseman

Greetings, Apologies, for not adding the coremltools version. I did try it with the latest coremltools==8.3.0 as well, as given in the environment file of the new repo. https://github.com/himanshunaidu/ESPNetv2_CoreML_Conversion/blob/main/espnetv2_coreml_conversion.yml Still didn't help.

As for the conversion time, I haven't actually managed to run the conversion successfully on my current Mac. I only remember that we managed to do a conversion quite a while ago, and that system had a 4090 (not a 3090 sorry). https://github.com/himanshunaidu/EdgeNets/pull/1 But in hindsight, based on what you mentioned, it may have been because the system had an i9. In that system it took around 30 minutes.

Apologies again for the confusion.

May 02 '25 21:05 himanshunaidu

Greetings, It has been a while since we communicated on this. Just bumping this issue in case it is possible to discuss on it. Thank you!

May 19 '25 00:05 himanshunaidu

It would help if you could give us a much smaller example that reproduces the issue, i.e. not an entire GitHub repository but a small amount of code that can be copied and pasted into a terminal.

May 19 '25 20:05 TobyRoseman

Yeah that is totally fair. Let me see if I can create a bare-minimum subset of the model that has the same issue. I did try doing that quite a while ago, as that would have helped me narrow down the issue as well, but it had other issues that I failed to solve. Let me check on that again.

May 19 '25 21:05 himanshunaidu