Summary

We propose adding Adaptive Entropy-Gated Contrastive Fusion (AECF) to PyTorch as a core multimodal fusion layer that addresses a critical production problem: missing modalities in real-world deployments.

The Problem

Current multimodal models fail catastrophically when sensors break, data is incomplete, or modalities are unavailable at inference time. This is a major barrier to deploying multimodal AI in production environments.

The Solution

AECF uses entropy-driven curriculum learning to train models that are robust to missing modalities:

High attention entropy → Less masking → Easier learning
Low attention entropy → More masking → Robustness training

Key Results

+18 percentage points mAP improvement when modalities are missing
200% reduction in calibration error
Only 1% runtime overhead
Drop-in replacement for existing fusion layers

Implementation

Complete reference implementation with 5,337 lines of production-ready code, comprehensive tests, and MS-COCO benchmarks included in the RFC.

Why This Matters

Multimodal AI is rapidly expanding (vision-language models, robotics, autonomous vehicles), but robustness to missing modalities remains an unsolved problem. AECF provides a principled, efficient solution that PyTorch users need today.

Request: Please route to multimodal/vision experts for technical review.

Jun 13 '25 01:06 leochlon

Hi @leochlon!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at [email protected]. Thanks!

Jun 13 '25 01:06 facebook-github-bot

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Jun 13 '25 02:06 facebook-github-bot

Hey @leochlon, thanks for the request! Note that we maintain a very high bar for inclusion of new modules within torch.nn, as each comes with a substantial maintenance cost on our end. In general, we will accept new modules if the underlying techniques have already achieved widespread adoption and there is a broad expectation that it PyTorch will provide such a module. It's also beneficial if there are performance reasons why the module should be provided by PyTorch itself rather than in a third-party repo.

From what I can tell, this is a new technique (https://arxiv.org/html/2505.15417v1) that needs time to establish user acceptance. I'll encourage you to maintain an implementation of this technique in a separate GitHub repo to make it available for users. We can leave this issue open to gauge user interest over time and revisit this in the future if the technique becomes ubiquitous. Please let us know if there is some technical reason why it is not possible to maintain this in a separate repo so we can evaluate the extension mechanisms we provide within PyTorch.

I'll also tag @NicolasHug here on torchvision just in case he has any thoughts

Jun 26 '25 15:06 mikaylagawarecki

RFC-0042-aecf-multimodal-fusion.md

Summary

The Problem

The Solution

Key Results

Implementation

Why This Matters

Action Required

Process