Towards Flexible Multi-modal Document Models (CVPR2023)

This repository is an official implementation of the paper titled above. Please refer to project page or paper for more details.

Setup

Requirements

We check the reproducibility under this environment.

Python3.7
CUDA 11.3
Tensorflow 2.8

How to install

Install python dependencies. Perhaps this should be done inside venv.

pip install -r requirements.txt

Note that Tensorflow has a version-specific system requirement for GPU environment. Check if the compatible CUDA/CuDNN runtime is installed.

Crello experiments

To try demo on pre-trained models

download pre-processed datasets for crello / rico and unzip it under ./data.
download pre-trained checkpointsfor crello / rico and unzip it under ./results.

DEMO

You can test some tasks using the pre-trained models in the notebook.

Training

You can train your own model. The trainer script takes a few arguments to control hyperparameters. See src/mfp/mfp/args.py for the list of available options. If the script slows an out-of-memory error, please make sure other processes do not occupy GPU memory and adjust --batch_size.

bin/train_mfp.sh crello --masking_method random  # Ours-IMP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt  # Ours-EXP
bin/train_mfp.sh crello --masking_method elem_pos_attr_img_txt --weights <WEIGHTS>   # Ours-EXP-FT

The trainer outputs logs, evaluation results, and checkpoints to tmp/mfp/jobs/<job_id>. The training progress can be monitored via tensorboard.

Evaluation

You perform quantitative evaluation.

bin/eval_mfp.sh --job_dir <JOB_DIR> (<ADDITIONAL_ARGS>)

See eval.py for <ADDITIONAL_ARGS>.

RICO experiments

DEMO

You can test some tasks using the pre-trained models in the notebook.

Training

The process is almost similar as above.

bin/train_mfp.sh rico --masking_method random  # Ours-IMP
bin/train_mfp.sh rico --masking_method elem_pos_attr  # Ours-EXP
bin/train_mfp.sh rico --masking_method elem_pos_attr --weights <WEIGHTS>  # Ours-EXP-FT

Evaluation

The process is similar as above.

Citation

If you find this code useful for your research, please cite our paper.

@inproceedings{inoue2023document,
    title={{Towards Flexible Multi-modal Document Models}},
    author={Naoto Inoue and Kotaro Kikuchi and Edgar Simo-Serra and Mayu Otani and Kota Yamaguchi},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year={2023},
    pages={14287-14296},
  }

flex-dm
flex-dm copied to clipboard

Metadata

Towards Flexible Multi-modal Document Models (CVPR2023)

Setup

Requirements

How to install

Crello experiments

DEMO

Training

Evaluation

RICO experiments

DEMO

Training

Evaluation

Citation

← Metadata

Owner

Metadata

flex-dm flex-dm copied to clipboard

Metadata

Towards Flexible Multi-modal Document Models (CVPR2023)

Setup

Requirements

How to install

Crello experiments

DEMO

Training

Evaluation

RICO experiments

DEMO

Training

Evaluation

Citation

← Metadata

Owner

Metadata

flex-dm
flex-dm copied to clipboard