VLCounter
VLCounter copied to clipboard
[AAAI 2024] VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting
VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting
Paper (ArXiv)

Official Implementation for AAAI 2024 paper VLCounter: Text-aware Visual Representation for Zero-Shot Object Counting
Update
🔥🔥🔥 [Dec 9] Our paper is accepted by AAAI 2024.
🔥🔥🔥 [Dec 28] Code and pretrained model are released.
Contents
- Preparation
- Run the Code
- Visualization
- Citation
- Acknowledgements
Preparation
1. Download datasets
In our project, the following datasets are used. Please visit the following links to download datasets:
We use CARPK and PUCPR+ by importing the hub package. Please click here for more information.
/
├─VLCounter/
│
├─FSC147/
│ ├─gt/
│ ├─image/
│ ├─ImageClasses_FSC147.txt
│ ├─Train_Test_Val_FSC_147.json
│ ├─annotation_FSC147_384.json
│
├─IOCfish5k/
│ ├─annotations/
│ ├─images/
│ ├─test_id.txt/
│ ├─train_id.txt/
│ ├─val_id.txt/
2. Download required Python packages:
The following packages are suitable for NVIDIA GeForce RTX A6000.
pip install torch==1.10.0+cu111 torchvision==0.11.0+cu111 torchaudio==0.10.0 -f https://download.pytorch.org/whl/torch_stable.html
pip install -r requirements.txt
pip install hub
If you want to use the docker environment, please download the docker image through the command below
docker pull sgkang0305/vlcounter
3. Download CLIP weight and Byte pair encoding (BPE) file
Please download the CLIP pretrained weight and locate the file under the "pretrain" folder.
Please download the BPE file and locate the file under the "tools/dataset" folder.
Run the Code
Train
You can train the model using the following command. Make sure to check the options on the train.sh file.
bash scripts/train.sh FSC {gpu_id} {exp_number}
Evaluation
You can test the performance of trained ckpt with the following command. Make sure to check the options in the test.sh file. Especially '--ckpt_used' to specify the specific weight file.
bash scripts/test.sh FSC {gpu_id} {exp_number}
We provide a pre-trained ckpt of our full model, which has similar quantitative result as presented in the paper.
| FSC val MAE | FSC val RMSE | FSC test MAE | FSC test RMSE |
|---|---|---|---|
| 18.06 | 65.13 | 17.05 | 106.16 |
| CARPK MAE | CARPK RMSE | PUCPR+ MAE | PUCPR+ RMSE |
|---|---|---|---|
| 6.46 | 8.68 | 48.94 | 69.08 |
Visualization

Citation
Consider citing us if you find our paper useful in your research :).
@inproceedings{kang2024vlcounter,
title={VLCounter: Text-Aware Visual Representation for Zero-Shot Object Counting},
author={Kang, Seunggu and Moon, WonJun and Kim, Euiyeon and Heo, Jae-Pil},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
volume={38},
number={3},
pages={2714--2722},
year={2024}
}
Acknowledgements
This project is based on implementation from CounTR.