tasking-manager
tasking-manager copied to clipboard
build: update backend dependency management to pdm + dockerfile
Closes #4642
Looks like this issue has been open for a while.
Great suggestion by @frafra to use PDM. Thank you.
Notes:
- I updated the Dockerfile.backend to reflect the changes.
- I also added the --no-cache flag to the final stage apk install command to save some space.
- I pinned the Python version number for version solving to >=3.7,<=3.8.
- The values for pyproject.toml were taken from the repo. Let me know if they need to change.
This is only a partial solution, as there are multiple references to requirements.txt throughout the repo.
python-venv is also used, but is now redundant with using pdm.
I have a commit on this branch that I have not pushed yet, removing requirements.txt and modifying any docs / workflows with reference to it (14 files). Let me know if you would like me to push.
Thanks for the PR @swoodcoc. I think it will be easier to review if you push it and maybe you can also describe what the user needs to do to set up the project (like instead of pip install -r requirements.txt
) and how this change benefits the project.
No worries, I'll add it now. I was worried it would increase the scope, as it also affects CI and deployment instructions.
I have updated all references to requirements.txt and using venv that I could find in the repo.
As for the benefit:
- PDM uses a dependency solver to check for compatibility between all dependencies.
- If a package is updated in the future that is incompatible with other package version, then it should be flagged during build.
- I also think PEP 582 dependency management is a lot simpler and reduces the need for virtual environments.
So it actually looks like you can't pip install dependencies directly from a pyproject.toml (I have never had to before). There is a related issue https://github.com/pypa/pip/issues/8049
I wanted to keep the edits to CI etc simple by pip installing the already solved dependency list.
To achieve this we can
pip install toml && python -c 'import toml; c = toml.load("pyproject.toml")
print("\n".join(c["project"]["dependencies"]))' | pip install -r /dev/stdin
I will update.
Everything should be complete to the best of my knowledge.
There seem to be quite a few Dockerfiles for the backend to maintain - I updated them all, as I wasn't sure which ones are redundant or still used by external processes.
Also on a side note: the Dockerfile using quay.io/hotosm/base-python-image
as a base uses Python 3.8.7, while the Dockerfile.backend uses 3.7. Perhaps a refactor is required to manage a consistent Python version.
The dependency install worked without issue (no conflicts) using Python 3.9, so I feel like both could be updated to that anyway.
Thanks for the feedback, I'm more than happy to go through the docs and update as requested.
As for the package installs, are you able to pip install
the packages, but not pdm install
? I would be surprised if this is the case.
I imagine the reason install failed is because of missing libraries on your Linux system, such as proj-dev
.
A local install requires all the libs packaged into the docker image to be installed on your machine: https://github.com/hotosm/tasking-manager/blob/4757148e011f2c66f3452a588379e67daef0e3af/Dockerfile#L11-L22
The Dockerfile above uses Alpine, so the packages would be named differently for example on a Debian based distro (e.g. Ubuntu). proj-dev
--> libproj-dev
etc.
I rebased and added the suggested changes to the docs.
Kudos, SonarCloud Quality Gate passed!
0 Bugs
0 Vulnerabilities
0 Security Hotspots
0 Code Smells
No Coverage information
0.0% Duplication
Related to #5705
I could update this PR and rebase with the current develop
, if of interest.
Thank you, @spwoodcock . We are currently swamped and unable to review your upgrades immediately. Can you please give us a week or two to get develop
branch cleaned-up and in order? I will keep you posted or you can follow the issues here on GH.
@ramyaragupathy @Aadesh-Baral let us discuss this in our next meeting and go over timelines and effort involved.
Of course @eternaltyro, no rush at all! Send me a message over any platform when you have time for this 👍
I'll try to find time to revise this PR soon (particularly the dockerfile). There are a few refinements I could include, plus Python 3.10.
I'll try to find time to revise this PR soon (particularly the dockerfile). There are a few refinements I could include, plus Python 3.10.
Thank you, @spwoodcock . Let me know once you do, and I'll give it another review.
I should get around to updating this at the weekend - got a pretty full work week ahead 👍
@spwoodcock I think the changes are good to go from our end. Please let us know once you have resolved any conflicts and we'll give it a review again.
I'll be working on this now - will update soon.
As we are using Python image 3.10-alpine3.17, the system proj version is 9.x. I had to update pyproj to 3.5.0. As far as I can see there are few breaking changes: https://pyproj4.github.io/pyproj/stable/history.html#id13 so this shouldn't be an issue.
It's not possible for us to update to Python 3.10 due to the schematics
dependency.
The Iterable
class was removed in Python 3.10.
Traceback:
File "/usr/src/app/manage.py", line 11, in <module>
from backend.services.users.authentication_service import AuthenticationService
File "/usr/src/app/backend/services/users/authentication_service.py", line 9, in <module>
from backend.services.messaging.message_service import MessageService
File "/usr/src/app/backend/services/messaging/message_service.py", line 12, in <module>
from backend.models.dtos.message_dto import MessageDTO, MessagesDTO
File "/usr/src/app/backend/models/dtos/message_dto.py", line 1, in <module>
from schematics import Model
File "/home/appuser/.local/lib/python3.10/site-packages/schematics/__init__.py", line 6, in <module>
from . import deprecated
File "/home/appuser/.local/lib/python3.10/site-packages/schematics/deprecated.py", line 8, in <module>
from .types.serializable import Serializable
File "/home/appuser/.local/lib/python3.10/site-packages/schematics/types/__init__.py", line 2, in <module>
from .base import *
File "/home/appuser/.local/lib/python3.10/site-packages/schematics/types/base.py", line 19, in <module>
from collections import Iterable, OrderedDict
ImportError: cannot import name 'Iterable' from 'collections' (/usr/local/lib/python3.10/collections/__init__.py)
I will revert to Python 3.9.
The schematics
modelling library is unmaintained. We should refactor to remove the dependency prior to upgrade to Python 3.10. I made an issue #5725
I have updated the dockerfile and pyproject.toml, plus docs and CircleCI. As well as PDM, I also made the container run as a non-root user for security. I also separated the final build target into debug and prod stages (default to prod), to allow for easier container debugging. Please review and let me know if I missed anything.
I think CircleCI is failing because there have been substantial changes to the CI config since I made this PR.
It seems to be running on the old config.yaml for CircleCI, not sure why (my version is in sync with develop
, aside from two lines edited.
On a side note: does tasking manager currently run in a Kubernetes cluster, or with the docker-compose config?
If it runs with the docker-compose.yml in the root of the repo, then I would update the gunicorn --workers
value to 4
.
If Kubernetes, then 1 worker is ideal (no change required).
With the currently dependency versions we can only use Python 3.9. To upgrade to Python 3.10, I suggest we:
- Merge this PR.
- Selectively merge #5642:
- All code edits.
- Dependency versions, for compatibility with Python 3.10.
- Skip edits to dockerfiles.
- Import the new dependency versions using
pdm import -f requirements requirements.txt
. - Remove
requirements.txt
. - Split out the dev dependencies
pdm add --dev ... ... ...
.
A note for the future:
- I kept Python with an alpine base as that was used before & I didn't want to be too obnoxious with my edits.
- However, Debian 11 would be a better base for a Python image, as Alpine isn't a great combo with Python: https://pythonspeed.com/articles/alpine-docker-python/
As we are close to the point of merging, I'm not suggesting we change now, but this could be a future PR.
@spwoodcock Let's resolve the merge conflicts, and then i will give this a final review.
I kept Python with an alpine base as that was used before & I didn't want to be too obnoxious with my edits. However, Debian 11 would be a better base for a Python image, as Alpine isn't a great combo with Python: https://pythonspeed.com/articles/alpine-docker-python/
Agreed. Let's deal with this later.
@Aadesh-Baral the PR removes requirements.txt
. Is there a way we can negotiate this conflict from our end?
In that case, lets update pyproject.toml with latest requirements.txt.
Let me know if there is anything I can do to help.
I could add the requirements file back in for now.
@spwoodcock Yes, please. Let's get this thing done.
edit: Just to clarify, pyproject.toml helps like Aadesh pointed out. I will leave it to you to decide how to resolve the requirements.txt conflict. We can remove it later in a different commit if you choose to bring that back in.