Add more models as part of GA models
Summary
Added 8 new representative models to ExecutorTorch examples:
- EfficientNet-B4: Image classification with CNN architecture
- DETR-ResNet50: Object detection using transformer decoder
- SegFormer-ADE: Semantic segmentation transformer
- Swin2SR: Super-resolution with Swin transformer
- ALBERT: Lightweight BERT for NLP tasks
- TrOCR: Optical character recognition transformer
- Wav2Vec2: Cross-lingual speech representation learning
All models include XNNPACK backend support with appropriate quantization configurations and full CI integration.
Test plan:
- Validate model export and execution with portable backend
- Test XNNPACK delegation and quantization (with appropriate exclusions)
- Integrate into CI workflows for automated testing
- Verify all models perform their intended tasks accurately
:link: Helpful Links
:test_tube: See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12340
- :page_facing_up: Preview Python docs built from this PR
Note: Links to docs will display an error until the docs builds have been completed.
:x: 4 New Failures, 1 Cancelled Job, 7 Unrelated Failures
As of commit 05a413448064f03ac159ac0c6c830999a8e8efd6 with merge base f82c2f087baf00746c26af4eb1e60f0f0ac2e96e ():
NEW FAILURES - The following jobs have failed:
-
Lint / lintrunner / linux-job (gh)
>>> Lint for examples/models/wav2vec2/model.py: -
trunk / test-llama-runner-mac (fp32, coreml) / macos-job (gh)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 1 -
trunk / test-models-linux-aarch64 (wav2vec2, portable, linux.arm64.2xlarge) / linux-job (gh)
AttributeError: type object 'Wav2Vec2Model' has no attribute 'from_pretrained' -
trunk / test-models-linux-aarch64 (wav2vec2, xnnpack-quantization-delegation, linux.arm64.2xlarge) / linux-job (gh)
AttributeError: type object 'Wav2Vec2Model' has no attribute 'from_pretrained'
CANCELLED JOB - The following job was cancelled. Please retry:
-
trunk / test-models-linux-aarch64 (detr_resnet50, portable, linux.arm64.2xlarge) / linux-job (gh)
##[error]The operation was canceled.
BROKEN TRUNK - The following jobs failed but were present on the merge base:
👉 Rebase onto the `viable/strict` branch to avoid these failures
-
pull / unittest / linux / linux-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success -
pull / unittest / macos / macos-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success -
pull / unittest-editable / linux / linux-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success -
pull / unittest-editable / macos / macos-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success -
trunk / test-llama-torchao-lowbit / macos-job (gh) (trunk failure)
RuntimeError: Command bash /Users/ec2-user/runner/_work/_temp/exec_script failed with exit code 134 -
trunk / unittest-release / linux / linux-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success -
trunk / unittest-release / macos / macos-job (gh) (trunk failure)
devtools/inspector/tests/inspector_utils_test.py::TestInspectorUtils::test_equip_debug_handle_to_export_program_success
This comment was automatically generated by Dr. CI and updates every 15 minutes.
Tests failures don't look related
swin2sr_2x and trocr_handwritten are not exporting yet. Disabled for now. https://github.com/pytorch/executorch/issues/12365
Need to export with strict=False so that these models are exportable.
However, when enabling strict=False by default, https://github.com/pytorch/executorch/pull/12368, there are a few failures, namely ic3,ic4 and llama. https://github.com/pytorch/executorch/issues/12370
how were these specific models were selected?
The other question I would have is, what would be the goal of adding more models? Do we just want to claim enablement or at the least provide good perf via xnnpack lowering? Because this might end up expanding the work that needs to happen, not 100% sure, but just calling it out.
Do agree on covering more vision/object detection
Thanks @kimishpatel
how were these specific models were selected?
So, i followed this process. I looked at https://ai-benchmark.com/ranking.html and https://mlcommons.org/benchmarks/inference-mobile/ models, and see if we can enable any delta that we didn't have. However, it was difficult to find the "official" pytorch and/or huggingface implementation. Most of them are TFLite or ONNX implementations. For some, we were able to find pytorch, like EfficientNet-B4, Albert
So, I swapped with similar models that already has:
Object detection: I swapped Yolov4, SSD-MobileNetV2 with DETR-ResNet50 Semantic Segmentation: I swapped DeepLabV3+ with SegFormer-ADE (even though we already have deeplabv3) Image super resolution: Swapped ESRGAN with Swin2SR OCR: Picked TrOCR
The other question I would have is, what would be the goal of adding more models?
Mainly inspired by the the ai-benchmark and mlcommons. The goal would be covering specific tasks that we don't have coverage. We won't be adding many models but only representative models within each tasks. There are still a few missing important ones like depth estimation and video super resolution
For more and more models in specific tasks, we can expand to leverage optimum-executorch instead.
Any specific reason that ai-benchmark and mlcommons are picked as a reference for the model list? Just curious because there are lists from other source
@mergennachin thanks for the clarification. And I'm happy to know how we can make a joint effort regarding get more task/model coverage, something happening in optimum-et. Some models like EfficientNet, Albert Swin, were added there already.
I looked at https://ai-benchmark.com/ranking.html and https://mlcommons.org/benchmarks/inference-mobile/ models, and see if we can enable any delta that we didn't have. However, it was difficult to find the "official" pytorch and/or huggingface implementation. Most of them are TFLite or ONNX implementations.
Looks like the main motivation is to enable ai-benchmark to use ET generated models? I recall in the meeting with them, they mentioned that we are deploying exactly same model across all devices and the expense of switching to different variant of models are high. Given it, would adding similar but not exact same models help with adoption by ai-benchmark?
There are still a few missing important ones like depth estimation and video super resolution
I recall there are depth estimation models enabled in HF transformers (e.g. DepthAnything). Not sure about video super resolution (HF doesn't have a task classification for image/video super resolution).
The goal would be covering specific tasks that we don't have coverage. We won't be adding many models but only representative models within each tasks.
What are other tasks we would like to cover for GA? Would it be a good idea if we browse top 1-2 popular models based on the task classification on hugging face hub?
What are other tasks we would like to cover for GA? Would it be a good idea if we browse top 1-2 popular models based on the task classification on hugging face hub?
@guangy10 here's what I compiled: https://github.com/pytorch/executorch/issues/12378
Any specific reason that ai-benchmark and mlcommons are picked as a reference for the model list? Just curious because there are lists from other source
Popularity? Please include other popular lists here so we can try them as well?
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.
Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.Stale pull requests will automatically be closed after 30 days of inactivity.