DocTr
DocTr copied to clipboard
The official code for “DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction”, ACM MM, Oral Paper, 2021.
Good news! Our new work exhibits state-of-the-art performances on the DocUNet Benchmark dataset: DocScanner: Robust Document Image Rectification with Progressive Learning
Good news! A comprehensive list of Awesome Document Image Rectification methods is available.
DocTr

DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction
ACM MM 2021 Oral
Any questions or discussions are welcomed!
Training
DocTr consists of two main components: a geometric unwarping transformer (GeoTr) and an illumination correction transformer (IllTr).
- For geometric unwarping, we train the GeoTr network using the Doc3D dataset.
- For illumination correction, we train the IllTr network based on the DRIC dataset.
Inference
- Download the pretrained models from Google Drive or Baidu Cloud, and put them to
$ROOT/model_pretrained/. - Put the distorted images in
$ROOT/distorted/. - Geometric unwarping. The rectified images are saved in
$ROOT/geo_rec/by default.python inference.py - Geometric unwarping and illumination rectification. The rectified images are saved in
$ROOT/ill_rec/by default.python inference.py --ill_rec True
Evaluation
- In the DocUNet Benchmark, the '64_1.png' and '64_2.png' distorted images are rotated by 180 degrees, which do not match the GT documents. It is ingored by most of existing works. Before the evaluation, please make a check.
- We use the same evaluation code for MS-SSIM and LD as DocUNet Benchmark dataset based on Matlab 2019a. Please compare the scores according to your Matlab version. We provide our Matlab interface file at
$ROOT/ssim_ld_eval.m. - The index of 30 document (60 images) of DocUNet Benchmark used for our OCR evaluation is
$ROOT/ocr_img.txt(Setting 1). Please refer to DewarpNet for the index of 25 document (50 images) of DocUNet Benchmark used for their OCR evaluation (Setting 2). - We provide the OCR evaluation code at
$ROOT/OCR_eval.py. The version of pytesseract is 0.3.8, and the version of Tesseract is recent 5.0.1.20220118. - We show the performance results of DocTr in the following table. For the performance of other methods, please refer to DocScanner.
- Use the rectified images available from Google Drive or Baidu Cloud for reproducing the quantitative performance on the DocUNet Benchmark reported in the paper and further comparison.
| Method | MS-SSIM | LD | ED (Setting 1) | CER | ED (Setting 2) | CER |
|---|---|---|---|---|---|---|
| GeoTr | 0.5105 | 7.76 | 464.83 | 0.1746 | 724.84 | 0.1832 |
Citation
If you find this code useful for your research, please use the following BibTeX entry.
@inproceedings{feng2021doctr,
title={DocTr: Document Image Transformer for Geometric Unwarping and Illumination Correction},
author={Feng, Hao and Wang, Yuechen and Zhou, Wengang and Deng, Jiajun and Li, Houqiang},
booktitle={Proceedings of the 29th ACM International Conference on Multimedia},
pages={273--281},
year={2021}
}
@article{feng2021docscanner,
title={DocScanner: Robust Document Image Rectification with Progressive Learning},
author={Feng, Hao and Zhou, Wengang and Deng, Jiajun and Tian, Qi and Li, Houqiang},
journal={arXiv preprint arXiv:2110.14968},
year={2021}
}
Acknowledgement
The codes are largely based on DocUNet, DewarpNet, and DocProj. Thanks for their wonderful works.