Always load pretrained weights of the first convolution layer
π οΈ PR Summary
Made with β€οΈ by Ultralytics Actions
π Summary
Improves how pretrained model weights are loaded, making the process more robust and compatible with different input channels. π
π Key Changes
- Enhances the model loading function to better handle cases where the first convolutional layer's weights don't match exactly.
- Adds logic to partially load weights for the first convolutional layer if input channels differ, instead of skipping or failing.
π― Purpose & Impact
- Ensures smoother transfer of pretrained weights, especially when models have different input channel configurations (e.g., grayscale vs. RGB).
- Reduces errors and manual intervention when loading weights from different sources.
- Makes model customization and experimentation easier for users, improving flexibility and user experience. π
π Hello @Laughing-q, thank you for submitting an ultralytics/ultralytics π PR! This is an automated response to help streamline the review process. An Ultralytics engineer will also review your contribution soon.
Please take a moment to review the following checklist to ensure your PR is ready for integration:
- β Define a Purpose: Clearly explain the purpose of your fix or feature in your PR description, and link to any relevant issues. Ensure your commit messages are clear, concise, and adhere to the project's conventions.
- β
Synchronize with Source: Confirm your PR is synchronized with the
ultralytics/ultralyticsmainbranch. If it's behind, update it by clicking the 'Update branch' button or by runninggit pullandgit merge mainlocally. - β Ensure CI Checks Pass: Verify all Ultralytics Continuous Integration (CI) checks are passing. If any checks fail, please address the issues.
- β Update Documentation: Update the relevant documentation for any new or modified features.
- β Add Tests: If applicable, include or update tests to cover your changes, and confirm that all tests are passing.
- β Sign the CLA: Please ensure you have signed our Contributor License Agreement if this is your first Ultralytics PR by writing "I have read the CLA Document and I sign the CLA" in a new message.
- β Minimize Changes: Limit your changes to the minimum necessary for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." β Bruce Lee
For more guidance, please refer to our Contributing Guide. If your PR addresses a bug and you haven't already, please provide a minimum reproducible example (MRE) to help us verify the fix.
Donβt hesitate to leave a comment if you have any questions. Thank you for contributing to Ultralytics! π
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 73.66%. Comparing base (
d64b6f2) to head (874349b). Report is 1 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #20567 +/- ##
==========================================
- Coverage 73.69% 73.66% -0.03%
==========================================
Files 144 144
Lines 19163 19172 +9
==========================================
+ Hits 14123 14124 +1
- Misses 5040 5048 +8
| Flag | Coverage Ξ | |
|---|---|---|
| Benchmarks | 31.16% <7.69%> (-0.05%) |
:arrow_down: |
| GPU | 36.65% <53.84%> (-0.01%) |
:arrow_down: |
| Tests | 67.77% <100.00%> (+<0.01%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
@glenn-jocher For multi-channel training we currently would not load the first conv layer of pretrained weights because of the shape mismatch. The knowledge of the first conv layer from pretrained weight is kinda important as it'd affect every next layer outputs, re-initializing in the first layer parameters make the loaded weights in next layers less meaningful, hence this PR would load the first conv as well even the channel is mismatched, as long as the kernel size matches.
A brief comparison running yolo11n.pt on coco8-multispectral.yaml with 50 epochs:
yolo train model=yolo11n.pt data=coco8-multispectral.yaml epochs=50
# main
YOLO11n summary (fused): 100 layers, 2,617,256 parameters, 0 gradients, 6.7 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 1/1 [00:00<00:00, 96.18it/s]
all 4 17 0.994 0.167 0.17 0.119
person 3 10 1 0 0.0268 0.0161
dog 1 1 1 0 0 0
horse 1 2 1 0 0 0
elephant 1 2 1 0 0 0
umbrella 1 1 0.961 1 0.995 0.697
potted plant 1 1 1 0 0 0
# this PR
YOLO11n summary (fused): 100 layers, 2,617,256 parameters, 0 gradients, 6.7 GFLOPs
Class Images Instances Box(P R mAP50 mAP50-95): 100%|ββββββββββ| 1/1 [00:00<00:00, 87.36it/s]
all 4 17 0.885 0.358 0.683 0.425
person 3 10 0.659 0.2 0.261 0.104
dog 1 1 1 0 0.332 0.0663
horse 1 2 1 0.948 0.995 0.628
elephant 1 2 1 0 0.52 0.26
umbrella 1 1 0.654 1 0.995 0.796
potted plant 1 1 1 0 0.995 0.697
FYI also a client asked me about supporting this. CC: @ambitious-octopus
Fantastic work, @Laughing-q and @glenn-jocher! π This important update makes model weight loading in Ultralytics models even more flexible, especially for multi-channel training. As Thomas Edison once said, "There is a way to do it betterβfind it." Your improvements embody this spirit, making custom training on unique datasets easier and more intuitive for everyone. Thank you for your dedication and for driving Ultralytics forward!