flex-dm
flex-dm copied to clipboard
[CVPR 2023 highlight] Towards Flexible Multi-modal Document Models
Towards Flexible Multi-modal Document Models (CVPR2023)
This repository is an official implementation of the paper titled above. Please refer to project page or paper for more details.
Setup
Requirements
We check the reproducibility under this environment.
- Python3.7
- CUDA 11.3
- Tensorflow 2.8
How to install
Install python dependencies. Perhaps this should be done inside venv.
pip install -r requirements.txt
Note that Tensorflow has a version-specific system requirement for GPU environment. Check if the compatible CUDA/CuDNN runtime is installed.
Crello experiments
To try demo on pre-trained models
- download pre-processed datasets for crello / rico and unzip it under
./data. - download pre-trained checkpointsfor crello / rico and unzip it under
./results.
DEMO
You can test some tasks using the pre-trained models in the notebook.
Training
You can train your own model.
The trainer script takes a few arguments to control hyperparameters.
See src/mfp/mfp/args.py for the list of available options.
If the script slows an out-of-memory error, please make sure other processes do not occupy GPU memory and adjust --batch_size.
bin/train_mfp.sh crello --masking_method random # Ours-IMP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt # Ours-EXP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt --weights <WEIGHTS> # Ours-EXP-FT
The trainer outputs logs, evaluation results, and checkpoints to tmp/mfp/jobs/<job_id>.
The training progress can be monitored via tensorboard.
Evaluation
You perform quantitative evaluation.
bin/eval_mfp.sh --job_dir <JOB_DIR> (<ADDITIONAL_ARGS>)
See eval.py for <ADDITIONAL_ARGS>.
RICO experiments
DEMO
You can test some tasks using the pre-trained models in the notebook.
Training
The process is almost similar as above.
bin/train_mfp.sh rico --masking_method random # Ours-IMP
bin/train_mfp.sh rico --masking_method elem_pos_attr # Ours-EXP
bin/train_mfp.sh rico --masking_method elem_pos_attr --weights <WEIGHTS> # Ours-EXP-FT
Evaluation
The process is similar as above.
Citation
If you find this code useful for your research, please cite our paper.
@inproceedings{inoue2023document,
title={{Towards Flexible Multi-modal Document Models}},
author={Naoto Inoue and Kotaro Kikuchi and Edgar Simo-Serra and Mayu Otani and Kota Yamaguchi},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2023},
pages={14287-14296},
}