[ON HOLD] Django settings cleanup, logging improvements and critical fixes
NOTE: This needs approval from ALL reviewers before merge.
What
- Segregate the Django settings required for
backendversus Celery powered services (execution-consumer,celery-flowerandcelery-beat). - Improve logging in some cases.
- Fix issue where pointing to a new Django settings module in
backend/.envwas not picked up. - Fix issue where
backendwas breaking when directly running in local due to version mismatch of some protocol buffers generated code. THIS FIXES CONTRIBUTOR INSTALLATIONS. - Allow building docker images locally from current branch. BENEFITS CONTRIBUTORS.
- Ensure another virtual env is not active while invoking dev env CLI commands. BENEFITS CONTRIBUTORS.
- Update
pdm.lockfiles forbackendandprompt-service.
Why
-
Integrating Celery distributed queue with Django is beneficial but Django settings need to be isolated by component, to ensure only required modules are loaded by Django. For example, various
INSTALLED_APPS, plugins, etc forbackendneed not be loaded for Celery.- Renaming
settings/dev.pytosettings/platform.pymight be more apt, as gradually we can keep all common settings inbase.py, and unique ones inplatform.pyandcelery.pyrespectively. - Adding
settings/celery.pykeeps all Celery related settings in one place and also eliminates the need forflowerconfig.py. - Adding
.celery.envensures env is not polluted.
- Renaming
-
If fetching user orgs failed due to some reason, the error was not logged making it difficult to debug.
-
DJANGO_SETTINGS_MODULEshould not be set in random places but ONLY in these four main entrypoints, BEFORE importing anything else:backend/asgi.py(production ASGI)backend/wsgi.py(production WSGI)manage.py(development server)backend/celery.py(Celery powered services)
-
Protocol buffers by nature generates some stub code for client and server integration from a
.protofile. However, this stub code needs to be regenerated should the protocol buffers dependency version change. The easier alternative is to havePROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=pythonenv setting which lets Python internally handle the stub code generation although relatively slower. This is what we do. However the other issue where the env vars and Django settings got loaded out of order caused mentioned protocol buffers env var to be unset, thus breakingbackend. -
Contributors should be able to run services directly in local always.
-
Evaluation server if not running will throw warnings, which could be suppressed by default to remove noise from the logs. We could have an option to show warnings when required, which will continue to help in debugging.
-
A local build of Docker images should be allowed to happen from any feature branch instead of
mainonly. -
Executing
dev-env-cli.shwhile another virtual env is already activated will cause undesired effects, hence need to be checked. -
pdm.lockfiles base branch caused resolution errors, hence needed to be regenerated manually.
How
Features
- Add error logging for fetching user orgs
- Improve logging and docs
- Add
settings/celery.py - Add
.celery.envand corresponding.sample.celery.env - Add
backend/celery.pyentrypoint forexecution-consumer,celery-flowerandcelery-beat - Rename celery instance to
celeryto avoid confusion - Refactor celery services config
- Rename
settings/dev.pytosettings/platform.py - Remove django settings module update from utils
- Remove unnecessary
flowerconfig.py - Add env var to suppress evaluation server warnings
- Add check for active venv in
dev-env-cli.sh - Cleanup
run-platform.sh
Fixes
- Load env vars and Django settings in properorder and BEFORE other modules are imported in all
backendentrypoints.
Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)
- YES, this refactors the Django settings for
backend(v1 and v2),execution-consumer,celery-flowerandcelery-beat.
Database Migrations
Env Config
- New env var
EVALUATION_SERVER_WARNINGS(default'false') inbackend/.env. - New default value of
DJANGO_SETTINGS_MODULE='backend.settings.platform'. - New env file
backend/.celery.env.
Relevant Docs
Related Issues or PRs
Dependencies Versions
Notes on Testing
The following modes of invocations seem to be running fine:
| Component | Command (after cd backend && . .venv/bin/activate) |
Running mode |
|---|---|---|
| backend via gunicorn | ./entrypoint.sh |
Direct in local |
| backend development server | python manage.py runserver 0.0.0.0:8000 |
Direct in local |
| execution consumer | .venv/bin/celery -A backend worker --loglevel=info -Q celery,celery_periodic_logs,celery_log_task_queue --autoscale=8,1 |
Direct in local |
| celery beat | .venv/bin/celery -A backend beat --scheduler django_celery_beat.schedulers:DatabaseScheduler -l INFO |
Direct in local |
| celery flower | .venv/bin/celery -A backend flower --port=5555 --purge_offline_workers=5 |
Direct in local |
| all | ./run-platform.sh -u -b -v current |
In Docker containers |
Screenshots
Checklist
I have read and understood the Contribution Guidelines.
@muhammad-ali-e Observed below error in execution-consumer though:
[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.
Did you remember to import the module containing this task?
Or maybe you're using relative imports?
Please see
https://docs.celeryq.dev/en/latest/internals/protocol.html
for more information.
The full contents of the message body was:
b'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)
The full contents of the message headers:
{'lang': 'py', 'task': 'workflow_manager.workflow.execution_log_utils.consume_log_history', 'id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None],'root_id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen1@7747504f1b7b', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}}
The delivery info for this task is:
{'exchange': '', 'routing_key': 'celery_periodic_logs'}
Traceback (most recent call last):
File "/app/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 659, in on_task_received
strategy = strategies[type_]
KeyError: 'workflow_manager.workflow.execution_log_utils.consume_log_history'
Seems like one import was missed in the backend entrypoint for Celery based services after the changes. Please review.
@tahierhussain Observed that static URL for logo was throwing not found in the login page after the changes:
Please review.
@gaya3-zipstack @kirtimanmishrazipstack pdm.lock files from base branch caused resolution errors, hence needed to be regenerated manually. We might need to see if previously if the PDM lock PR agent didn't run or exited with error.
Please review.
@muhammad-ali-e Observed below error in
execution-consumerthough:[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'. The message has been ignored and discarded. Did you remember to import the module containing this task? Or maybe you're using relative imports?
@hari-kuriakose
It looks like there's a task stored in the database with the task path workflow_manager.workflow.execution_log_utils.consume_log_history. celery beat is triggering this task, but the consumer isn’t finding any registered task with this name when it receives the request.
Additionally, we have some tasks (methods/functions) in our backend service that use the shared_task decorator. These tasks are used both by our Django backend and by Celery. They should be registered with Celery when they run, since in the current setup, the backend code is also part of the consumer's (Celery's) code.
Are we planning to separate the consumer from the backend code?
| filepath | function | $$\textcolor{#23d18b}{\tt{passed}}$$ | SUBTOTAL |
|---|---|---|---|
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_logs}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ | $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ | $$\textcolor{#23d18b}{\tt{1}}$$ |
| $$\textcolor{#23d18b}{\tt{TOTAL}}$$ | $$\textcolor{#23d18b}{\tt{9}}$$ | $$\textcolor{#23d18b}{\tt{9}}$$ |
