TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

This is an official PyTorch implementation of "TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition".

Introduction

TransXNet is a CNN-Transformer hybrid vision backbone that can model both global and local dynamics with a Dual Dynamic Token Mixer (D-Mixer), achieving superior performance over both CNN and Transformer-based models.

Image Classification

1. Requirements

We highly suggest using our provided dependencies to ensure reproducibility:

# Environments:
cuda==11.6
python==3.8.15
# Packages:
mmcv==1.7.1
timm==0.6.12
torch==1.13.1
torchvision==0.14.1

2. Data Preparation

ImageNet with the following folder structure, you can extract ImageNet by this script.

│imagenet/
├──train/
│  ├── n01440764
│  │   ├── n01440764_10026.JPEG
│  │   ├── n01440764_10027.JPEG
│  │   ├── ......
│  ├── ......
├──val/
│  ├── n01440764
│  │   ├── ILSVRC2012_val_00000293.JPEG
│  │   ├── ILSVRC2012_val_00002138.JPEG
│  │   ├── ......
│  ├── ......

3. Main Results on ImageNet with Pretrained Models

Models	Input Size	FLOPs (G)	Params (M)	Top-1 Acc.(%)	Download
TransXNet-T	224x224	1.8	12.8	81.6	model
TransXNet-S	224x224	4.5	26.9	83.8	model
TransXNet-B	224x224	8.3	48.0	84.6	model

4. Train

To train TransXNet models on ImageNet-1K with 8 gpus (single node), run:

bash scripts/train_tiny.sh # train TransXNet-T
bash scripts/train_small.sh # train TransXNet-S
bash scripts/train_base.sh # train TransXNet-B

5. Validation

To evaluate TransXNet on ImageNet-1K, run:

MODEL=transxnet_t # transxnet_{t, s, b}
python3 validate.py \
/path/to/imagenet \
--model $MODEL -b 128 \
--pretrained # or --checkpoint /path/to/checkpoint

Object Detection and Semantic Segmentation

Object Detection
Semantic Segmentation

Citation

If you find this project useful for your research, please consider citing:

@article{lou2023transxnet,
  title={TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition},
  author={Lou, Meng and Zhou, Hong-Yu and Yang, Sibei and Yu, Yizhou},
  journal={arXiv preprint arXiv:2310.19380},
  year={2023}
}

Acknowledgment

Our implementation is mainly based on the following codebases. We gratefully thank the authors for their wonderful works.

poolformer
pytorch-image-models
mmdetection
mmsegmentation

Contact

If you have any questions, please feel free to create issues or contact me at [email protected].

TransXNet
TransXNet copied to clipboard

Metadata

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

Introduction

Image Classification

1. Requirements

2. Data Preparation

3. Main Results on ImageNet with Pretrained Models

4. Train

5. Validation

Object Detection and Semantic Segmentation

Citation

Acknowledgment

Contact

← Metadata

Owner

Metadata

TransXNet TransXNet copied to clipboard

Metadata

TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition

Introduction

Image Classification

1. Requirements

2. Data Preparation

3. Main Results on ImageNet with Pretrained Models

4. Train

5. Validation

Object Detection and Semantic Segmentation

Citation

Acknowledgment

Contact

← Metadata

Owner

Metadata

TransXNet
TransXNet copied to clipboard