LynnHaDo/Document-Layout-Analysis: Object Detection Model for Scanned Documents

Layout Analysis of Scanned Documents

Document Layout Analysis using YOLOv8
View Demo · Report Bug · Request Feature

Table of Contents

Updates
About The Project
- Built With
Getting Started
- Prerequisites
- Installation
Works Cited
Acknowledgments

Updates

In this project, I provided 1 object detection model trained on the existing YOLOv8 weights. They are uploaded in my Hugging Face Space of the project. If you feel the need to use or fine-tune the models in any parts of your work, please cite this repository. Thank you, and don't forget to give this repo a 🌟!

About The Project

Due to the lack of computational resources, I only performed the training process on the Doclaynet-base dataset which contains 6910 train images, 648 val images, 499 test images. However, the model could perform relatively well, further proving the superiority of YOLOv8 model.

(back to top)

Built With

(back to top)

Prerequisites

python 3
ultralytics
numpy
opencv-python

Installation

Clone the repo

git clone https://github.com/LynnHaDo/Document-Layout-Analysis.git

Install packages

pip install ultralytics
pip install numpy
pip install opencv-python

Download Doclaynet dataset and save it as datasets/doclaynet-base
(Optional) Download pretrained YOLOv8s weights

(back to top)

Works Cited

Ultralytics YOLOv8

authors:
 - family-names: Jocher
   given-names: Glenn
   orcid: "https://orcid.org/0000-0001-5950-6979"
 - family-names: Chaurasia
   given-names: Ayush
   orcid: "https://orcid.org/0000-0002-7603-6750"
 - family-names: Qiu
   given-names: Jing
   orcid: "https://orcid.org/0000-0003-3783-7069"
title: "YOLO by Ultralytics"
version: 8.0.0
date-released: 2023-1-10
license: AGPL-3.0
url: "https://github.com/ultralytics/ultralytics"

Doclaynet-base dataset

@article{doclaynet2022,
 title = {DocLayNet: A Large Human-Annotated Dataset for Document-Layout Segmentation},
 doi = {10.1145/3534678.353904},
 url = {https://doi.org/10.1145/3534678.3539043},
 author = {Pfitzmann, Birgit and Auer, Christoph and Dolfi, Michele and Nassar, Ahmed S and Staar, Peter W J},
 year = {2022},
 isbn = {9781450393850},
 publisher = {Association for Computing Machinery},
 address = {New York, NY, USA},
 booktitle = {Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
 pages = {3743–3751},
 numpages = {9},
 location = {Washington DC, USA},
 series = {KDD '22}
 }

Contact

Linh Do - [email protected]/[email protected] (personal)

Project Link: https://github.com/LynnHaDo/Document-Layout-Analysis

LinkedIn: https://linkedin.com/in/Linh Do

(back to top)

Document-Layout-Analysis
Document-Layout-Analysis copied to clipboard

Metadata

Layout Analysis of Scanned Documents

Updates

About The Project

Built With

Prerequisites

Installation

Works Cited

Contact

← Metadata

Owner

Metadata

Document-Layout-Analysis Document-Layout-Analysis copied to clipboard

Metadata

Layout Analysis of Scanned Documents

Updates

About The Project

Built With

Prerequisites

Installation

Works Cited

Contact

← Metadata

Owner

Metadata

Document-Layout-Analysis
Document-Layout-Analysis copied to clipboard