pytorch-image-models icon indicating copy to clipboard operation
pytorch-image-models copied to clipboard

Add ViG models [NeurIPS 2022]

Open iamhankai opened this issue 2 years ago • 5 comments

Add ViG models from paper: Vision GNN: An Image is Worth Graph of Nodes (NeurIPS 2022), https://arxiv.org/abs/2206.00272

Network architecture plays a key role in the deep learning-based computer vision system. The widely-used convolutional neural network and transformer treat the image as a grid or sequence structure, which is not flexible to capture irregular and complex objects. In this paper, we propose to represent the image as a graph structure and introduce a new Vision GNN (ViG) architecture to extract graph-level feature for visual tasks. We first split the image to a number of patches which are viewed as nodes, and construct a graph by connecting the nearest neighbors. Based on the graph representation of images, we build our ViG model to transform and exchange information among all the nodes. ViG consists of two basic modules: Grapher module with graph convolution for aggregating and updating graph information, and FFN module with two linear layers for node feature transformation. Both isotropic and pyramid architectures of ViG are built with different model sizes. Extensive experiments on image recognition and object detection tasks demonstrate the superiority of our ViG architecture. We hope this pioneering study of GNN on general visual tasks will provide useful inspiration and experience for future research.

Model Params (M) FLOPs (B) Top-1
Pyramid ViG-Ti 10.7 1.7 78.5
Pyramid ViG-S 27.3 4.6 82.1
Pyramid ViG-M 51.7 8.9 83.1
Pyramid ViG-B 82.6 16.8 83.7

iamhankai avatar Dec 05 '22 15:12 iamhankai

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@iamhankai FYI, you can use register_notrace_function and @register_notrace_module to register leaf functions or modules in your model that won't trace in FX due to boolean and other flow control concerns...

rwightman avatar Dec 07 '22 00:12 rwightman

Hmm, seems the tracing issue harder to solve, just preventing trace won't bypass the bool issue without some restructure. I'd also need to tweak some other interface issues wrt to other models.

Trying the model out, the 'base' as example seems roughly on par with a Swin (v1) base for accuracy and param/flops, but it runs at < 1/2 the speed. Any way to improve the runtime performance?

Have there been any weights or attempts to scale the training to larger datasets? Interesting performance differents there vs other vit or vit related hybrid arch?

rwightman avatar Dec 08 '22 01:12 rwightman

We have pretrained ViG on ImageNet-22K. It performs slightly better than Swin Transformer:

Model Params (M) FLOPs (B) IN1K Top-1
Swin-S 50 8.7 83.2
Pyramid ViG-M 51.7 8.9 83.8

As for the runtime, accelerating GNN is an open problem.

iamhankai avatar Dec 08 '22 03:12 iamhankai

@rwightman Hi, we released the weights to scale the training to larger ImageNet22K dataset: https://github.com/huawei-noah/Efficient-AI-Backbones/releases/download/pyramid-vig/pvig_m_im21k_90e.pth

It performs slightly better than IM22K pretrained Swin Transformer:

Model Params (M) FLOPs (B) IN1K Top-1
Swin-S 50 8.7 83.2
Pyramid ViG-M 51.7 8.9 83.8

iamhankai avatar Apr 03 '23 09:04 iamhankai