tasking-manager icon indicating copy to clipboard operation
tasking-manager copied to clipboard

build: update backend dependency management to pdm + dockerfile

Open spwoodcock opened this issue 2 years ago • 9 comments

Closes #4642

Looks like this issue has been open for a while.

Great suggestion by @frafra to use PDM. Thank you.

Notes:

  • I updated the Dockerfile.backend to reflect the changes.
  • I also added the --no-cache flag to the final stage apk install command to save some space.
  • I pinned the Python version number for version solving to >=3.7,<=3.8.
  • The values for pyproject.toml were taken from the repo. Let me know if they need to change.

spwoodcock avatar Jun 15 '22 22:06 spwoodcock

This is only a partial solution, as there are multiple references to requirements.txt throughout the repo.

python-venv is also used, but is now redundant with using pdm.

I have a commit on this branch that I have not pushed yet, removing requirements.txt and modifying any docs / workflows with reference to it (14 files). Let me know if you would like me to push.

spwoodcock avatar Jun 15 '22 23:06 spwoodcock

Thanks for the PR @swoodcoc. I think it will be easier to review if you push it and maybe you can also describe what the user needs to do to set up the project (like instead of pip install -r requirements.txt) and how this change benefits the project.

Aadesh-Baral avatar Jun 16 '22 05:06 Aadesh-Baral

No worries, I'll add it now. I was worried it would increase the scope, as it also affects CI and deployment instructions.

I have updated all references to requirements.txt and using venv that I could find in the repo.

As for the benefit:

  • PDM uses a dependency solver to check for compatibility between all dependencies.
  • If a package is updated in the future that is incompatible with other package version, then it should be flagged during build.
  • I also think PEP 582 dependency management is a lot simpler and reduces the need for virtual environments.

spwoodcock avatar Jun 16 '22 09:06 spwoodcock

So it actually looks like you can't pip install dependencies directly from a pyproject.toml (I have never had to before). There is a related issue https://github.com/pypa/pip/issues/8049

I wanted to keep the edits to CI etc simple by pip installing the already solved dependency list.

To achieve this we can

pip install toml && python -c 'import toml; c = toml.load("pyproject.toml")
print("\n".join(c["project"]["dependencies"]))' | pip install -r /dev/stdin

I will update.

spwoodcock avatar Jun 16 '22 09:06 spwoodcock

Everything should be complete to the best of my knowledge.

There seem to be quite a few Dockerfiles for the backend to maintain - I updated them all, as I wasn't sure which ones are redundant or still used by external processes.

spwoodcock avatar Jun 16 '22 10:06 spwoodcock

Also on a side note: the Dockerfile using quay.io/hotosm/base-python-image as a base uses Python 3.8.7, while the Dockerfile.backend uses 3.7. Perhaps a refactor is required to manage a consistent Python version.

The dependency install worked without issue (no conflicts) using Python 3.9, so I feel like both could be updated to that anyway.

spwoodcock avatar Jun 16 '22 10:06 spwoodcock

Thanks for the feedback, I'm more than happy to go through the docs and update as requested.

As for the package installs, are you able to pip install the packages, but not pdm install? I would be surprised if this is the case.

I imagine the reason install failed is because of missing libraries on your Linux system, such as proj-dev.

A local install requires all the libs packaged into the docker image to be installed on your machine: https://github.com/hotosm/tasking-manager/blob/4757148e011f2c66f3452a588379e67daef0e3af/Dockerfile#L11-L22

The Dockerfile above uses Alpine, so the packages would be named differently for example on a Debian based distro (e.g. Ubuntu). proj-dev --> libproj-dev etc.

spwoodcock avatar Jun 20 '22 10:06 spwoodcock

I rebased and added the suggested changes to the docs.

spwoodcock avatar Jun 21 '22 10:06 spwoodcock

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

sonarqubecloud[bot] avatar Jul 21 '22 10:07 sonarqubecloud[bot]

Related to #5705

eternaltyro avatar Apr 13 '23 18:04 eternaltyro

I could update this PR and rebase with the current develop, if of interest.

spwoodcock avatar Apr 14 '23 02:04 spwoodcock

Thank you, @spwoodcock . We are currently swamped and unable to review your upgrades immediately. Can you please give us a week or two to get develop branch cleaned-up and in order? I will keep you posted or you can follow the issues here on GH.

eternaltyro avatar Apr 14 '23 06:04 eternaltyro

@ramyaragupathy @Aadesh-Baral let us discuss this in our next meeting and go over timelines and effort involved.

eternaltyro avatar Apr 14 '23 06:04 eternaltyro

Of course @eternaltyro, no rush at all! Send me a message over any platform when you have time for this 👍

spwoodcock avatar Apr 14 '23 07:04 spwoodcock

I'll try to find time to revise this PR soon (particularly the dockerfile). There are a few refinements I could include, plus Python 3.10.

spwoodcock avatar Apr 15 '23 12:04 spwoodcock

I'll try to find time to revise this PR soon (particularly the dockerfile). There are a few refinements I could include, plus Python 3.10.

Thank you, @spwoodcock . Let me know once you do, and I'll give it another review.

eternaltyro avatar Apr 15 '23 13:04 eternaltyro

I should get around to updating this at the weekend - got a pretty full work week ahead 👍

spwoodcock avatar Apr 17 '23 15:04 spwoodcock

@spwoodcock I think the changes are good to go from our end. Please let us know once you have resolved any conflicts and we'll give it a review again.

eternaltyro avatar Apr 20 '23 18:04 eternaltyro

I'll be working on this now - will update soon.

spwoodcock avatar Apr 21 '23 13:04 spwoodcock

As we are using Python image 3.10-alpine3.17, the system proj version is 9.x. I had to update pyproj to 3.5.0. As far as I can see there are few breaking changes: https://pyproj4.github.io/pyproj/stable/history.html#id13 so this shouldn't be an issue.

spwoodcock avatar Apr 21 '23 16:04 spwoodcock

It's not possible for us to update to Python 3.10 due to the schematics dependency. The Iterable class was removed in Python 3.10.

Traceback:

  File "/usr/src/app/manage.py", line 11, in <module>
    from backend.services.users.authentication_service import AuthenticationService
  File "/usr/src/app/backend/services/users/authentication_service.py", line 9, in <module>
    from backend.services.messaging.message_service import MessageService
  File "/usr/src/app/backend/services/messaging/message_service.py", line 12, in <module>
    from backend.models.dtos.message_dto import MessageDTO, MessagesDTO
  File "/usr/src/app/backend/models/dtos/message_dto.py", line 1, in <module>
    from schematics import Model
  File "/home/appuser/.local/lib/python3.10/site-packages/schematics/__init__.py", line 6, in <module>
    from . import deprecated
  File "/home/appuser/.local/lib/python3.10/site-packages/schematics/deprecated.py", line 8, in <module>
    from .types.serializable import Serializable
  File "/home/appuser/.local/lib/python3.10/site-packages/schematics/types/__init__.py", line 2, in <module>
    from .base import *
  File "/home/appuser/.local/lib/python3.10/site-packages/schematics/types/base.py", line 19, in <module>
    from collections import Iterable, OrderedDict
ImportError: cannot import name 'Iterable' from 'collections' (/usr/local/lib/python3.10/collections/__init__.py)

I will revert to Python 3.9.

The schematics modelling library is unmaintained. We should refactor to remove the dependency prior to upgrade to Python 3.10. I made an issue #5725

spwoodcock avatar Apr 21 '23 16:04 spwoodcock

I have updated the dockerfile and pyproject.toml, plus docs and CircleCI. As well as PDM, I also made the container run as a non-root user for security. I also separated the final build target into debug and prod stages (default to prod), to allow for easier container debugging. Please review and let me know if I missed anything.

I think CircleCI is failing because there have been substantial changes to the CI config since I made this PR. It seems to be running on the old config.yaml for CircleCI, not sure why (my version is in sync with develop, aside from two lines edited.

On a side note: does tasking manager currently run in a Kubernetes cluster, or with the docker-compose config? If it runs with the docker-compose.yml in the root of the repo, then I would update the gunicorn --workers value to 4. If Kubernetes, then 1 worker is ideal (no change required).

spwoodcock avatar Apr 21 '23 17:04 spwoodcock

With the currently dependency versions we can only use Python 3.9. To upgrade to Python 3.10, I suggest we:

  1. Merge this PR.
  2. Selectively merge #5642:
  • All code edits.
  • Dependency versions, for compatibility with Python 3.10.
  • Skip edits to dockerfiles.
  1. Import the new dependency versions using pdm import -f requirements requirements.txt.
  2. Remove requirements.txt.
  3. Split out the dev dependencies pdm add --dev ... ... ....

spwoodcock avatar Apr 27 '23 18:04 spwoodcock

A note for the future:

  • I kept Python with an alpine base as that was used before & I didn't want to be too obnoxious with my edits.
  • However, Debian 11 would be a better base for a Python image, as Alpine isn't a great combo with Python: https://pythonspeed.com/articles/alpine-docker-python/

As we are close to the point of merging, I'm not suggesting we change now, but this could be a future PR.

spwoodcock avatar May 10 '23 07:05 spwoodcock

@spwoodcock Let's resolve the merge conflicts, and then i will give this a final review.

Aadesh-Baral avatar May 10 '23 08:05 Aadesh-Baral

I kept Python with an alpine base as that was used before & I didn't want to be too obnoxious with my edits.

However, Debian 11 would be a better base for a Python image, as Alpine isn't a great combo with Python: https://pythonspeed.com/articles/alpine-docker-python/

Agreed. Let's deal with this later.

eternaltyro avatar May 10 '23 10:05 eternaltyro

@Aadesh-Baral the PR removes requirements.txt. Is there a way we can negotiate this conflict from our end?

eternaltyro avatar May 10 '23 10:05 eternaltyro

In that case, lets update pyproject.toml with latest requirements.txt.

Aadesh-Baral avatar May 10 '23 10:05 Aadesh-Baral

Let me know if there is anything I can do to help.

I could add the requirements file back in for now.

spwoodcock avatar May 10 '23 11:05 spwoodcock

@spwoodcock Yes, please. Let's get this thing done.

edit: Just to clarify, pyproject.toml helps like Aadesh pointed out. I will leave it to you to decide how to resolve the requirements.txt conflict. We can remove it later in a different commit if you choose to bring that back in.

eternaltyro avatar May 10 '23 11:05 eternaltyro