LaKo

LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection

In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.

🔔 News

2024-02 We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo].

🌈 Model Architecture

Model_architecture

📚 Dependencies

Python 3
PyTorch (>= 1.6.0)
Transformers (version 3.0.2)
NumPy
faiss-cpu

🧰 Datasets

Training Data and KGs is available here
In contrast to data_source.zip, we provide a processing script and some source data for both vqa2 and okvqa datasets. We provided Baidu Cloud (password:r42d) and Google Link.

🚀 Train

bash run_okvqa_train.sh

or try full training process to get the Attention signal for iterative training

bash run_okvqa_full.sh

🚀 Test

bash run_okvqa_test.sh

❗ Note

(Optional) You can first pre-train LaKo (large version) on VQA2.0 then re-train on OKVQA for better performance.
You can open the .sh file for parameter modification.
The latest Transformers (e.g., 4.XX.XX) have some differences from the older version, which may lead to some unexpected error.

Our code is based on FiD:

Distilling Knowledge from Reader to Retriever:https://arxiv.org/abs/2012.04584.
Github link to FiD

🔬 Paradigm

🤝 Cite:

Please condiser citing this paper if you use the code or data from our work. Thanks a lot :)

@inproceedings{DBLP:conf/jist/0007HCGFP0Z22,
  author    = {Zhuo Chen and
               Yufeng Huang and
               Jiaoyan Chen and
               Yuxia Geng and
               Yin Fang and
               Jeff Z. Pan and
               Ningyu Zhang and
               Wen Zhang},
  title     = {LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text
               Injection},
  booktitle = {{IJCKG}},
  pages     = {20--29},
  publisher = {{ACM}},
  year      = {2022}
}

LaKo
LaKo copied to clipboard

Metadata

LaKo

🔔 News

🌈 Model Architecture

📚 Dependencies

🧰 Datasets

🚀 Train

🚀 Test

❗ Note

Our code is based on FiD:

🔬 Paradigm

🤝 Cite:

← Metadata

Owner

Metadata

LaKo LaKo copied to clipboard

Metadata

LaKo

🔔 News

🌈 Model Architecture

📚 Dependencies

🧰 Datasets

🚀 Train

🚀 Test

❗ Note

Our code is based on FiD:

🔬 Paradigm

🤝 Cite:

← Metadata

Owner

Metadata

LaKo
LaKo copied to clipboard