MamBEV icon indicating copy to clipboard operation
MamBEV copied to clipboard

Official code repo of ICLR'25 paper: MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations

MamBEV:Enabling State Space Models to Learn Birds-Eye-View Representations

Workflow

MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations, ICLR 2025, Paper

Abstract

3D visual perception tasks, such as 3D detection from multi-camera images, are essential components of autonomous driving and assistance systems. However, designing computationally efficient methods remains a significant challenge. In this paper, we propose a Mamba-based framework called MamBEV, which learns unified Bird’s Eye View (BEV) representations using linear spatio-temporal SSM-based attention. This approach supports multiple 3D perception tasks with significantly improved computational and memory efficiency. Furthermore, we introduce SSM based cross-attention, analogous to standard cross attention, where BEV query representations can interact with relevant image features. Extensive experiments demonstrate MamBEV’s promising performance across diverse visual perception metrics, highlighting its advantages in input scaling efficiency compared to existing benchmark models.

Overall Architecture

Overall

Getting Started

  • Installation
  • Prepare Dataset
  • Run and Eval

Model Zoo

Backbone Method Lr Schd NDS mAP memory Config Download
R50 MamBEV-tiny 24ep 39.9 26.6 - config [model]/[log]
R101-DCN MamBEV-small 24ep 52.5 42.2 - config model/[log]

Catalog

  • [ ] MamBEV-Base
  • [ ] MamBEV Optimization, including memory, speed, inference.
  • [x] MamBEV-Small and Tiny Release
  • [ ] 3D Detection checkpoints
  • [x] 3D Detection code
  • [x] Backbones

Support Us

If you find this work useful, please consider:

  • Starring the repository
  • Citing our paper
  • Contributing to the codebase

Bibtex

If this work is helpful for your research, please consider citing the following BibTeX entry.

@inproceedings{ke2025mambev,
  title={MamBEV: Enabling State Space Models to Learn Birds-Eye-View Representations},
  author={Ke, Hongyu and Morris, Jack and Oguchi, Kentaro and Cao, Xiaofei and Liu, Yongkang and Wang, Haoxin and Ding, Yi},
  booktitle={The Thirteenth International Conference on Learning Representations},
  year={2025}
}

Acknowledgement

Built on the shoulders of giants. Many thanks to these excellent open source projects: