PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

new codebase structure

Open GreatV opened this issue 1 year ago • 5 comments

the new codebase structure looks like this:

tree -L 3        
.
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_en.md
├── VERSION_NUMBER
├── applications
├── benchmark
├── configs
├── deploy
├── doc
├── paddleocr.egg-info
├── pyproject.toml
├── requirements.txt
├── setup.py
├── src
│   └── paddleocr
│       ├── __init__.py
│       ├── __pycache__
│       ├── cli
│       ├── paddleocr.py
│       ├── ppocr
│       ├── ppstructure
│       └── tools
├── test_tipc
├── tests
├── tools
└── train.sh

GreatV avatar Jun 11 '24 06:06 GreatV

What is the reason for creating a separate src directory?

SWHL avatar Jun 11 '24 08:06 SWHL

hi @SWHL, Creating a src directory in a Python project has several benefits:

  1. Clear Project Structure:

    • Separates source code from other files like configuration, documentation, tests, and data, making the project easier to read and maintain.
  2. Avoiding Namespace Clashes:

    • Prevents conflicts with built-in Python modules or other dependencies by keeping the main project directory out of the PYTHONPATH.
  3. Easier Testing and Development:

    • Clarifies import paths and ensures tests run against the correct version of the code, avoiding issues with global installations.
  4. Encouraging Proper Package Installation:

    • Promotes using pip install -e . for development, ensuring correct imports and testing in an environment similar to production.
  5. Facilitating Automation and CI/CD:

    • Simplifies continuous integration and deployment by providing clear separation of concerns.
  6. Promoting Best Practices:

    • Aligns with recommendations from Python community standards and guides.

Example Project Structure

my_project/
├── .git/
├── .gitignore
├── README.md
├── pyproject.toml
├── requirements.txt
├── src/
│   └── my_package/
│       ├── __init__.py
│       ├── module1.py
│       └── module2.py
├── tests/
│   ├── __init__.py
│   └── test_module1.py
└── docs/
    └── index.md

In this structure:

  • src/my_package/ contains the source code.
  • tests/ contains test files.
  • docs/ contains documentation.
  • Configuration files are at the root for easy access.

This setup ensures a clean separation, making the project more manageable.

GreatV avatar Jun 11 '24 08:06 GreatV

At present, I don't think it is suitable for this big structural change. Even though it has so many benefits. This will make the issues and discussions that rely on the old structure invalid. I think it would be better to gradually adjust this structure after all the existing bugs are resolved.

Just personal thoughts.

SWHL avatar Jun 11 '24 08:06 SWHL

Hi @SWHL, You're right, this PR is still a work in progress and will continue when conditions are suitable.

GreatV avatar Jun 11 '24 08:06 GreatV

Current codebase layout is neither a flat layout, nor a src layout. It's a single-module distribution, the __init__.py and paddleocr.py are placed directly under the project root. I think this is a compromise for combing ppocr module and ppstructure module into one package.

As a middle size project, we'll need to modernize the codebase layout to src layout to clearly separate source codes and usage codes, to improve the maintainability.

ref: https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/

jzhang533 avatar Jun 11 '24 11:06 jzhang533