Document-Layout-Analysis
Document-Layout-Analysis copied to clipboard
Object Detection Model for Scanned Documents
Layout Analysis of Scanned Documents
Document Layout Analysis using YOLOv8
View Demo
·
Report Bug
·
Request Feature
Table of Contents
- Updates
-
About The Project
- Built With
-
Getting Started
- Prerequisites
- Installation
- Works Cited
- Acknowledgments
Updates
In this project, I provided 1 object detection model trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!
About The Project
Due to the lack of computational resources, I only performed the training process on the Doclaynet-base dataset which contains 6910 train images, 648 val images, 499 test images. However, the model could perform relatively well, further proving the superiority of YOLOv8 model.
(back to top)
Built With
(back to top)
Prerequisites
- python 3
- ultralytics
- numpy
- opencv-python
Installation
- Clone the repo
git clone https://github.com/LynnHaDo/Document-Layout-Analysis.git - Install packages
pip install ultralytics pip install numpy pip install opencv-python - Download Doclaynet dataset and save it as
datasets/doclaynet-base - (Optional) Download pretrained YOLOv8s weights
(back to top)
Works Cited
-
Ultralytics YOLOv8
authors: - family-names: Jocher given-names: Glenn orcid: "https://orcid.org/0000-0001-5950-6979" - family-names: Chaurasia given-names: Ayush orcid: "https://orcid.org/0000-0002-7603-6750" - family-names: Qiu given-names: Jing orcid: "https://orcid.org/0000-0003-3783-7069" title: "YOLO by Ultralytics" version: 8.0.0 date-released: 2023-1-10 license: AGPL-3.0 url: "https://github.com/ultralytics/ultralytics" -
Doclaynet-base dataset
@article{doclaynet2022, title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation}, doi = {10.1145/3534678.353904}, url = {https://doi.org/10.1145/3534678.3539043}, author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J}, year = {2022}, isbn = {9781450393850}, publisher = {Association for Computing Machinery}, address = {New York, NY, USA}, booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining}, pages = {3743–3751}, numpages = {9}, location = {Washington DC, USA}, series = {KDD '22} }
Contact
Linh Do - [email protected]/[email protected] (personal)
Project Link: https://github.com/LynnHaDo/Document-Layout-Analysis
LinkedIn: https://linkedin.com/in/Linh Do
(back to top)