LaKo
                                
                                
                                
                                    LaKo copied to clipboard
                            
                            
                            
                        [Paper][IJCKG 2022] LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text Injection
LaKo
In this paper, we propose LaKo, a knowledge-driven VQA method via Late Knowledge-to-text Injection. To effectively incorporate an external KG, we transfer triples into text and propose a late injection mechanism. Finally we address VQA as a text generation task with an effective encoder-decoder paradigm.
🔔 News
2024-02We preprint our Survey Knowledge Graphs Meet Multi-Modal Learning: A Comprehensive Survey [Repo].
🌈 Model Architecture

📚 Dependencies
- Python 3
 - PyTorch (>= 1.6.0)
 - Transformers (version 3.0.2)
 - NumPy
 - faiss-cpu
 
🧰 Datasets
TrainingDataandKGsis available here- In contrast to 
data_source.zip, we provide a processing script and some source data for both vqa2 and okvqa datasets. We provided Baidu Cloud (password:r42d) and Google Link. 
🚀 Train
bash run_okvqa_train.sh
or try full training process to get the Attention signal for iterative training
bash run_okvqa_full.sh
🚀 Test
bash run_okvqa_test.sh
❗ Note
(Optional)You can first pre-train LaKo (large version) onVQA2.0then re-train onOKVQAfor better performance.- You can open the 
.shfile for parameter modification. - The latest Transformers (e.g., 4.XX.XX) have some differences from the older version, which may lead to some unexpected error.
 
Our code is based on FiD:
- Distilling Knowledge from Reader to Retriever:https://arxiv.org/abs/2012.04584.
 - Github link to FiD
 
🔬 Paradigm
🤝 Cite:
Please condiser citing this paper if you use the code or data from our work.
Thanks a lot :)
@inproceedings{DBLP:conf/jist/0007HCGFP0Z22,
  author    = {Zhuo Chen and
               Yufeng Huang and
               Jiaoyan Chen and
               Yuxia Geng and
               Yin Fang and
               Jeff Z. Pan and
               Ningyu Zhang and
               Wen Zhang},
  title     = {LaKo: Knowledge-driven Visual Question Answering via Late Knowledge-to-Text
               Injection},
  booktitle = {{IJCKG}},
  pages     = {20--29},
  publisher = {{ACM}},
  year      = {2022}
}