ultralytics Always load pretrained weights of the first convolution layer

🛠️ PR Summary

🌟 Summary

Improves how pretrained model weights are loaded, making the process more robust and compatible with different input channels. 🔄

📊 Key Changes

Enhances the model loading function to better handle cases where the first convolutional layer's weights don't match exactly.
Adds logic to partially load weights for the first convolutional layer if input channels differ, instead of skipping or failing.

🎯 Purpose & Impact

Ensures smoother transfer of pretrained weights, especially when models have different input channel configurations (e.g., grayscale vs. RGB).
Reduces errors and manual intervention when loading weights from different sources.
Makes model customization and experimentation easier for users, improving flexibility and user experience. 🚀

May 09 '25 14:05 Laughing-q

👋 Hello @Laughing-q, thank you for submitting an ultralytics/ultralytics 🚀 PR! This is an automated response to help streamline the review process. An Ultralytics engineer will also review your contribution soon.

Please take a moment to review the following checklist to ensure your PR is ready for integration:

✅ Define a Purpose: Clearly explain the purpose of your fix or feature in your PR description, and link to any relevant issues. Ensure your commit messages are clear, concise, and adhere to the project's conventions.
✅ Synchronize with Source: Confirm your PR is synchronized with the ultralytics/ultralytics main branch. If it's behind, update it by clicking the 'Update branch' button or by running git pull and git merge main locally.
✅ Ensure CI Checks Pass: Verify all Ultralytics Continuous Integration (CI) checks are passing. If any checks fail, please address the issues.
✅ Update Documentation: Update the relevant documentation for any new or modified features.
✅ Add Tests: If applicable, include or update tests to cover your changes, and confirm that all tests are passing.
✅ Sign the CLA: Please ensure you have signed our Contributor License Agreement if this is your first Ultralytics PR by writing "I have read the CLA Document and I sign the CLA" in a new message.
✅ Minimize Changes: Limit your changes to the minimum necessary for your bug fix or feature addition. "It is not daily increase but daily decrease, hack away the unessential. The closer to the source, the less wastage there is." — Bruce Lee

For more guidance, please refer to our Contributing Guide. If your PR addresses a bug and you haven't already, please provide a minimum reproducible example (MRE) to help us verify the fix.

Don’t hesitate to leave a comment if you have any questions. Thank you for contributing to Ultralytics! 🚀

May 09 '25 14:05 UltralyticsAssistant

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 73.66%. Comparing base (d64b6f2) to head (874349b). Report is 1 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main   #20567      +/-   ##
==========================================
- Coverage   73.69%   73.66%   -0.03%     
==========================================
  Files         144      144              
  Lines       19163    19172       +9     
==========================================
+ Hits        14123    14124       +1     
- Misses       5040     5048       +8

Flag	Coverage Δ
Benchmarks	`31.16% <7.69%> (-0.05%)`	:arrow_down:
GPU	`36.65% <53.84%> (-0.01%)`	:arrow_down:
Tests	`67.77% <100.00%> (+<0.01%)`	:arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:

:snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

May 09 '25 14:05 codecov[bot]

@glenn-jocher For multi-channel training we currently would not load the first conv layer of pretrained weights because of the shape mismatch. The knowledge of the first conv layer from pretrained weight is kinda important as it'd affect every next layer outputs, re-initializing in the first layer parameters make the loaded weights in next layers less meaningful, hence this PR would load the first conv as well even the channel is mismatched, as long as the kernel size matches. A brief comparison running yolo11n.pt on coco8-multispectral.yaml with 50 epochs:

yolo train model=yolo11n.pt data=coco8-multispectral.yaml epochs=50

# main
YOLO11n summary (fused): 100 layers, 2,617,256 parameters, 0 gradients, 6.7 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00, 96.18it/s]
                   all          4         17      0.994      0.167       0.17      0.119
                person          3         10          1          0     0.0268     0.0161
                   dog          1          1          1          0          0          0
                 horse          1          2          1          0          0          0
              elephant          1          2          1          0          0          0
              umbrella          1          1      0.961          1      0.995      0.697
          potted plant          1          1          1          0          0          0

# this PR
YOLO11n summary (fused): 100 layers, 2,617,256 parameters, 0 gradients, 6.7 GFLOPs
                 Class     Images  Instances      Box(P          R      mAP50  mAP50-95): 100%|██████████| 1/1 [00:00<00:00, 87.36it/s]
                   all          4         17      0.885      0.358      0.683      0.425
                person          3         10      0.659        0.2      0.261      0.104
                   dog          1          1          1          0      0.332     0.0663
                 horse          1          2          1      0.948      0.995      0.628
              elephant          1          2          1          0       0.52       0.26
              umbrella          1          1      0.654          1      0.995      0.796
          potted plant          1          1          1          0      0.995      0.697

FYI also a client asked me about supporting this. CC: @ambitious-octopus

May 10 '25 06:05 Laughing-q

Fantastic work, @Laughing-q and @glenn-jocher! 🚀 This important update makes model weight loading in Ultralytics models even more flexible, especially for multi-channel training. As Thomas Edison once said, "There is a way to do it better—find it." Your improvements embody this spirit, making custom training on unique datasets easier and more intuitive for everyone. Thank you for your dedication and for driving Ultralytics forward!

May 12 '25 12:05 UltralyticsAssistant