CLEVER
CLEVER copied to clipboard
Visually Grounded Commonsense Knowledge Acquisition
The code and datasets of our AAAI 2023 paper Visually Grounded Commonsense Knowledge Acquisition.
Overview

In this work, we propose to formulate Commonsense Knowledge Extraction (CKE) as a distantly supervised multi-instance learning problem. Given an entity pair (such as person-bottle) and associated images, our model first understands entity interactions in each image, and then selects informative ones (solid line) to summarize the commonsense relations. We present a dedicated CKE framework CLEVER that integrate VLP models with contrastive attention to deal with complex commonsense relation learning. You can find more details in our paper.
Installation
Check INSTALL.md for installation instructions.
Data Preparation
Check DATASET.md for data preparation.
Training
# Prepare dataset according to 'Data Preparation' Section
cd src/Oscar
bash train.sh
Baselines
Text-based Baselines
We directly use RTP to extract triplets from Conceptual Captions which contains more than 3 millon image captions. Triplets are sorted by frequency for evaluation.
PLM-based Baselines
# Vanilla-FT
cd src
python vanilla_ft.py
# LAMA and Prompt-FT
cd src
conda activate CLEVER_prompt_env # to resolve dependency conflic
python prompt_ft.py
Image-based Baseline
cd Oscar
bash run_instance_pred_cls.sh
bash run_VRD_baseline.sh
Extracted Commonsense Knowledge
You can download the commonsense knowledge triplets extracted by CELEVER on test split from here. The data structure is:
[
(subject, object, predicate, commonsense_confidence),
...
]
Citation
Please consider citing this paper if you use the code:
@inproceedings{yao2023clever,
title={Visually Grounded Commonsense Knowledge Acquisition},
author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Li, Mengdi and Xie, Ruobing and Weber, Cornelius and Liu, Zhiyuan and Zheng, Haitao and Wermter, Stefan and Chua, Tat-Seng and Sun, Maosong},
booktitle={Proceedings of AAAI},
year={2023}
}
License
CLEVER is released under the MIT license. See LICENSE for details.
Acknowledge
Our implementation is based on the fantastic code of Oscar.