new codebase structure
the new codebase structure looks like this:
tree -L 3
.
├── LICENSE
├── MANIFEST.in
├── README.md
├── README_en.md
├── VERSION_NUMBER
├── applications
├── benchmark
├── configs
├── deploy
├── doc
├── paddleocr.egg-info
├── pyproject.toml
├── requirements.txt
├── setup.py
├── src
│ └── paddleocr
│ ├── __init__.py
│ ├── __pycache__
│ ├── cli
│ ├── paddleocr.py
│ ├── ppocr
│ ├── ppstructure
│ └── tools
├── test_tipc
├── tests
├── tools
└── train.sh
What is the reason for creating a separate src directory?
hi @SWHL,
Creating a src directory in a Python project has several benefits:
-
Clear Project Structure:
- Separates source code from other files like configuration, documentation, tests, and data, making the project easier to read and maintain.
-
Avoiding Namespace Clashes:
- Prevents conflicts with built-in Python modules or other dependencies by keeping the main project directory out of the PYTHONPATH.
-
Easier Testing and Development:
- Clarifies import paths and ensures tests run against the correct version of the code, avoiding issues with global installations.
-
Encouraging Proper Package Installation:
- Promotes using
pip install -e .for development, ensuring correct imports and testing in an environment similar to production.
- Promotes using
-
Facilitating Automation and CI/CD:
- Simplifies continuous integration and deployment by providing clear separation of concerns.
-
Promoting Best Practices:
- Aligns with recommendations from Python community standards and guides.
Example Project Structure
my_project/
├── .git/
├── .gitignore
├── README.md
├── pyproject.toml
├── requirements.txt
├── src/
│ └── my_package/
│ ├── __init__.py
│ ├── module1.py
│ └── module2.py
├── tests/
│ ├── __init__.py
│ └── test_module1.py
└── docs/
└── index.md
In this structure:
-
src/my_package/contains the source code. -
tests/contains test files. -
docs/contains documentation. - Configuration files are at the root for easy access.
This setup ensures a clean separation, making the project more manageable.
At present, I don't think it is suitable for this big structural change. Even though it has so many benefits. This will make the issues and discussions that rely on the old structure invalid. I think it would be better to gradually adjust this structure after all the existing bugs are resolved.
Just personal thoughts.
Hi @SWHL, You're right, this PR is still a work in progress and will continue when conditions are suitable.
Current codebase layout is neither a flat layout, nor a src layout.
It's a single-module distribution, the __init__.py and paddleocr.py are placed directly under the project root. I think this is a compromise for combing ppocr module and ppstructure module into one package.
As a middle size project, we'll need to modernize the codebase layout to src layout to clearly separate source codes and usage codes, to improve the maintainability.
ref: https://packaging.python.org/en/latest/discussions/src-layout-vs-flat-layout/