VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Paper (ArXiv)

teaser

Official Implementation for AAAI 2024 paper VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Update

🔥🔥🔥 [Dec 9] Our paper is accepted by AAAI 2024.

🔥🔥🔥 [Dec 28] Code and pretrained model are released.

Preparation
Run the Code
Visualization
Citation
Acknowledgements

Preparation

1. Download datasets

In our project, the following datasets are used. Please visit the following links to download datasets:

We use CARPK and PUCPR+ by importing the hub package. Please click here for more information.

/
├─VLCounter/
│
├─FSC147/    
│  ├─gt/
│  ├─image/
│  ├─ImageClasses_FSC147.txt
│  ├─Train_Test_Val_FSC_147.json
│  ├─annotation_FSC147_384.json
│  
├─IOCfish5k/
│  ├─annotations/
│  ├─images/
│  ├─test_id.txt/
│  ├─train_id.txt/
│  ├─val_id.txt/

2. Download required Python packages:

The following packages are suitable for NVIDIA GeForce RTX A6000.

pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install hub

If you want to use the docker environment, please download the docker image through the command below

docker pull sgkang0305/vlcounter

3. Download CLIP weight and Byte pair encoding (BPE) file

Please download the CLIP pretrained weight and locate the file under the "pretrain" folder.

Please download the BPE file and locate the file under the "tools/dataset" folder.

Run the Code

Train

You can train the model using the following command. Make sure to check the options on the train.sh file.

bash scripts/train.sh FSC {gpu_id} {exp_number}

Evaluation

You can test the performance of trained ckpt with the following command. Make sure to check the options in the test.sh file. Especially '--ckpt_used' to specify the specific weight file.

bash scripts/test.sh FSC {gpu_id} {exp_number}

We provide a pre-trained ckpt of our full model, which has similar quantitative result as presented in the paper.

FSC val MAE	FSC val RMSE	FSC test MAE	FSC test RMSE
18.06	65.13	17.05	106.16

CARPK MAE	CARPK RMSE	PUCPR+ MAE	PUCPR+ RMSE
6.46	8.68	48.94	69.08

Visualization

Citation

Consider citing us if you find our paper useful in your research :).

@inproceedings{kang2024vlcounter,
  title={VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting},
  author={Kang, Seunggu and Moon, WonJun and Kim, Euiyeon and Heo, Jae-Pil},
  booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
  volume={38},
  number={3},
  pages={2714--2722},
  year={2024}
}

Acknowledgements

This project is based on implementation from CounTR.

VLCounter
VLCounter copied to clipboard

Metadata

VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Paper (ArXiv)

Contents

Preparation

1. Download datasets

2. Download required Python packages:

3. Download CLIP weight and Byte pair encoding (BPE) file

Run the Code

Train

Evaluation

Visualization

Citation

Acknowledgements

← Metadata

Owner

Metadata

VLCounter VLCounter copied to clipboard

Metadata

VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting

Paper (ArXiv)

Contents

Preparation

1. Download datasets

2. Download required Python packages:

3. Download CLIP weight and Byte pair encoding (BPE) file

Run the Code

Train

Evaluation

Visualization

Citation

Acknowledgements

← Metadata

Owner

Metadata

VLCounter
VLCounter copied to clipboard