Visually Grounded Commonsense Knowledge Acquisition

The code and datasets of our AAAI 2023 paper Visually Grounded Commonsense Knowledge Acquisition.

Overview

CLEVER Framework

In this work, we propose to formulate Commonsense Knowledge Extraction (CKE) as a distantly supervised multi-instance learning problem. Given an entity pair (such as person-bottle) and associated images, our model first understands entity interactions in each image, and then selects informative ones (solid line) to summarize the commonsense relations. We present a dedicated CKE framework CLEVER that integrate VLP models with contrastive attention to deal with complex commonsense relation learning. You can find more details in our paper.

Installation

Check INSTALL.md for installation instructions.

Data Preparation

Check DATASET.md for data preparation.

Training

# Prepare dataset according to 'Data Preparation' Section

cd src/Oscar
bash train.sh

Baselines

Text-based Baselines

We directly use RTP to extract triplets from Conceptual Captions which contains more than 3 millon image captions. Triplets are sorted by frequency for evaluation.

PLM-based Baselines

# Vanilla-FT
cd src
python vanilla_ft.py

# LAMA and Prompt-FT
cd src
conda activate CLEVER_prompt_env # to resolve dependency conflic
python prompt_ft.py

Image-based Baseline

cd Oscar
bash run_instance_pred_cls.sh

bash run_VRD_baseline.sh

Extracted Commonsense Knowledge

You can download the commonsense knowledge triplets extracted by CELEVER on test split from here. The data structure is:

[
    (subject, object, predicate, commonsense_confidence),
    ...
]

Citation

Please consider citing this paper if you use the code:

@inproceedings{yao2023clever,
  title={Visually Grounded Commonsense Knowledge Acquisition},
  author={Yao, Yuan and Yu, Tianyu and Zhang, Ao and Li, Mengdi and Xie, Ruobing and Weber, Cornelius and Liu, Zhiyuan and Zheng, Haitao and Wermter, Stefan and Chua, Tat-Seng and Sun, Maosong},
  booktitle={Proceedings of AAAI},
  year={2023}
}

License

CLEVER is released under the MIT license. See LICENSE for details.

Acknowledge

Our implementation is based on the fantastic code of Oscar.

CLEVER
CLEVER copied to clipboard

Metadata

Visually Grounded Commonsense Knowledge Acquisition

Overview

Installation

Data Preparation

Training

Baselines

Text-based Baselines

PLM-based Baselines

Image-based Baseline

Extracted Commonsense Knowledge

Citation

License

Acknowledge

← Metadata

Owner

Metadata

CLEVER CLEVER copied to clipboard

Metadata

Visually Grounded Commonsense Knowledge Acquisition

Overview

Installation

Data Preparation

Training

Baselines

Text-based Baselines

PLM-based Baselines

Image-based Baseline

Extracted Commonsense Knowledge

Citation

License

Acknowledge

← Metadata

Owner

Metadata

CLEVER
CLEVER copied to clipboard