TransXNet
TransXNet copied to clipboard
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
This is an official PyTorch implementation of "TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition".
Introduction
TransXNet is a CNN-Transformer hybrid vision backbone that can model both global and local dynamics with a Dual Dynamic Token Mixer (D-Mixer), achieving superior performance over both CNN and Transformer-based models.
Image Classification
1. Requirements
We highly suggest using our provided dependencies to ensure reproducibility:
# Environments:
cuda==11.6
python==3.8.15
# Packages:
mmcv==1.7.1
timm==0.6.12
torch==1.13.1
torchvision==0.14.1
2. Data Preparation
ImageNet with the following folder structure, you can extract ImageNet by this script.
│imagenet/
├──train/
│ ├── n01440764
│ │ ├── n01440764_10026.JPEG
│ │ ├── n01440764_10027.JPEG
│ │ ├── ......
│ ├── ......
├──val/
│ ├── n01440764
│ │ ├── ILSVRC2012_val_00000293.JPEG
│ │ ├── ILSVRC2012_val_00002138.JPEG
│ │ ├── ......
│ ├── ......
3. Main Results on ImageNet with Pretrained Models
| Models | Input Size | FLOPs (G) | Params (M) | Top-1 Acc.(%) | Download |
|---|---|---|---|---|---|
| TransXNet-T | 224x224 | 1.8 | 12.8 | 81.6 | model |
| TransXNet-S | 224x224 | 4.5 | 26.9 | 83.8 | model |
| TransXNet-B | 224x224 | 8.3 | 48.0 | 84.6 | model |
4. Train
To train TransXNet models on ImageNet-1K with 8 gpus (single node), run:
bash scripts/train_tiny.sh # train TransXNet-T
bash scripts/train_small.sh # train TransXNet-S
bash scripts/train_base.sh # train TransXNet-B
5. Validation
To evaluate TransXNet on ImageNet-1K, run:
MODEL=transxnet_t # transxnet_{t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint
Object Detection and Semantic Segmentation
Object Detection
Semantic Segmentation
Citation
If you find this project useful for your research, please consider citing:
@article{lou2023transxnet,
title={TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition},
author={Lou, Meng and Zhou, Hong-Yu and Yang, Sibei and Yu, Yizhou},
journal={arXiv preprint arXiv:2310.19380},
year={2023}
}
Acknowledgment
Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.
Contact
If you have any questions, please feel free to create issues or contact me at [email protected].