unstract icon indicating copy to clipboard operation
unstract copied to clipboard

[ON HOLD] Django settings cleanup, logging improvements and critical fixes

Open hari-kuriakose opened this issue 1 year ago • 7 comments

NOTE: This needs approval from ALL reviewers before merge.

What

  • Segregate the Django settings required for backend versus Celery powered services (execution-consumer, celery-flower and celery-beat).
  • Improve logging in some cases.
  • Fix issue where pointing to a new Django settings module in backend/.env was not picked up.
  • Fix issue where backend was breaking when directly running in local due to version mismatch of some protocol buffers generated code. THIS FIXES CONTRIBUTOR INSTALLATIONS.
  • Allow building docker images locally from current branch. BENEFITS CONTRIBUTORS.
  • Ensure another virtual env is not active while invoking dev env CLI commands. BENEFITS CONTRIBUTORS.
  • Update pdm.lock files for backend and prompt-service.

Why

  • Integrating Celery distributed queue with Django is beneficial but Django settings need to be isolated by component, to ensure only required modules are loaded by Django. For example, various INSTALLED_APPS, plugins, etc for backend need not be loaded for Celery.

    • Renaming settings/dev.py to settings/platform.py might be more apt, as gradually we can keep all common settings in base.py, and unique ones in platform.py and celery.py respectively.
    • Adding settings/celery.py keeps all Celery related settings in one place and also eliminates the need for flowerconfig.py.
    • Adding .celery.env ensures env is not polluted.
  • If fetching user orgs failed due to some reason, the error was not logged making it difficult to debug.

  • DJANGO_SETTINGS_MODULE should not be set in random places but ONLY in these four main entrypoints, BEFORE importing anything else:

    • backend/asgi.py (production ASGI)
    • backend/wsgi.py (production WSGI)
    • manage.py (development server)
    • backend/celery.py (Celery powered services)
  • Protocol buffers by nature generates some stub code for client and server integration from a .proto file. However, this stub code needs to be regenerated should the protocol buffers dependency version change. The easier alternative is to have PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python env setting which lets Python internally handle the stub code generation although relatively slower. This is what we do. However the other issue where the env vars and Django settings got loaded out of order caused mentioned protocol buffers env var to be unset, thus breaking backend.

  • Contributors should be able to run services directly in local always.

  • Evaluation server if not running will throw warnings, which could be suppressed by default to remove noise from the logs. We could have an option to show warnings when required, which will continue to help in debugging.

  • A local build of Docker images should be allowed to happen from any feature branch instead of main only.

  • Executing dev-env-cli.sh while another virtual env is already activated will cause undesired effects, hence need to be checked.

  • pdm.lock files base branch caused resolution errors, hence needed to be regenerated manually.

How

Features

  • Add error logging for fetching user orgs
  • Improve logging and docs
  • Add settings/celery.py
  • Add .celery.env and corresponding .sample.celery.env
  • Add backend/celery.py entrypoint for execution-consumer, celery-flower and celery-beat
  • Rename celery instance to celery to avoid confusion
  • Refactor celery services config
  • Rename settings/dev.py to settings/platform.py
  • Remove django settings module update from utils
  • Remove unnecessary flowerconfig.py
  • Add env var to suppress evaluation server warnings
  • Add check for active venv in dev-env-cli.sh
  • Cleanup run-platform.sh

Fixes

  • Load env vars and Django settings in properorder and BEFORE other modules are imported in all backend entrypoints.

Can this PR break any existing features. If yes, please list possible items. If no, please explain why. (PS: Admins do not merge the PR without this section filled)

  • YES, this refactors the Django settings for backend (v1 and v2), execution-consumer, celery-flower and celery-beat.

Database Migrations

Env Config

  • New env var EVALUATION_SERVER_WARNINGS (default 'false') in backend/.env.
  • New default value of DJANGO_SETTINGS_MODULE='backend.settings.platform'.
  • New env file backend/.celery.env.

Relevant Docs

Related Issues or PRs

Dependencies Versions

Notes on Testing

The following modes of invocations seem to be running fine:

Component Command (after cd backend && . .venv/bin/activate) Running mode
backend via gunicorn ./entrypoint.sh Direct in local
backend development server python manage.py runserver 0.0.0.0:8000 Direct in local
execution consumer .venv/bin/celery -A backend worker --loglevel=info -Q celery,celery_periodic_logs,celery_log_task_queue --autoscale=8,1 Direct in local
celery beat .venv/bin/celery -A backend beat --scheduler django_celery_beat.schedulers:DatabaseScheduler -l INFO Direct in local
celery flower .venv/bin/celery -A backend flower --port=5555 --purge_offline_workers=5 Direct in local
all ./run-platform.sh -u -b -v current In Docker containers

Screenshots

Checklist

I have read and understood the Contribution Guidelines.

hari-kuriakose avatar Aug 15 '24 17:08 hari-kuriakose

@muhammad-ali-e Observed below error in execution-consumer though:

[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

Please see
https://docs.celeryq.dev/en/latest/internals/protocol.html
for more information.

The full contents of the message body was:
b'[[], {}, {"callbacks": null, "errbacks": null, "chain": null, "chord": null}]' (77b)

The full contents of the message headers:
{'lang': 'py', 'task': 'workflow_manager.workflow.execution_log_utils.consume_log_history', 'id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'shadow': None, 'eta': None, 'expires': None, 'group': None, 'group_index': None, 'retries': 0, 'timelimit': [None, None],'root_id': '57b8ce11-0a9b-4057-a825-fa3f752fae11', 'parent_id': None, 'argsrepr': '()', 'kwargsrepr': '{}', 'origin': 'gen1@7747504f1b7b', 'ignore_result': False, 'replaced_task_nesting': 0, 'stamped_headers': None, 'stamps': {}}

The delivery info for this task is:
{'exchange': '', 'routing_key': 'celery_periodic_logs'}
Traceback (most recent call last):
File "/app/.venv/lib/python3.9/site-packages/celery/worker/consumer/consumer.py", line 659, in on_task_received
strategy = strategies[type_]
KeyError: 'workflow_manager.workflow.execution_log_utils.consume_log_history'

Seems like one import was missed in the backend entrypoint for Celery based services after the changes. Please review.

hari-kuriakose avatar Aug 16 '24 10:08 hari-kuriakose

@tahierhussain Observed that static URL for logo was throwing not found in the login page after the changes:

oss_login_no_logo

Please review.

hari-kuriakose avatar Aug 16 '24 10:08 hari-kuriakose

@gaya3-zipstack @kirtimanmishrazipstack pdm.lock files from base branch caused resolution errors, hence needed to be regenerated manually. We might need to see if previously if the PDM lock PR agent didn't run or exited with error.

Please review.

hari-kuriakose avatar Aug 16 '24 10:08 hari-kuriakose

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Aug 16 '24 15:08 sonarqubecloud[bot]

@muhammad-ali-e Observed below error in execution-consumer though:

[2024-08-16 05:35:12,761: ERROR/MainProcess] Received unregistered task of type 'workflow_manager.workflow.execution_log_utils.consume_log_history'.
The message has been ignored and discarded.

Did you remember to import the module containing this task?
Or maybe you're using relative imports?

@hari-kuriakose It looks like there's a task stored in the database with the task path workflow_manager.workflow.execution_log_utils.consume_log_history. celery beat is triggering this task, but the consumer isn’t finding any registered task with this name when it receives the request.

Additionally, we have some tasks (methods/functions) in our backend service that use the shared_task decorator. These tasks are used both by our Django backend and by Celery. They should be registered with Celery when they run, since in the current setup, the backend code is also part of the consumer's (Celery's) code.

Are we planning to separate the consumer from the backend code?

muhammad-ali-e avatar Aug 21 '24 09:08 muhammad-ali-e

Quality Gate Failed Quality Gate failed

Failed conditions
3 Security Hotspots

See analysis details on SonarCloud

sonarqubecloud[bot] avatar Nov 08 '24 10:11 sonarqubecloud[bot]

filepath function $$\textcolor{#23d18b}{\tt{passed}}$$ SUBTOTAL
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_logs}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_cleanup\_skip}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_client\_init}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image\_exists}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_image}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_get\_container\_run\_config\_without\_mount}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{worker/src/unstract/worker/clients/test\_docker.py}}$$ $$\textcolor{#23d18b}{\tt{test\_run\_container}}$$ $$\textcolor{#23d18b}{\tt{1}}$$ $$\textcolor{#23d18b}{\tt{1}}$$
$$\textcolor{#23d18b}{\tt{TOTAL}}$$ $$\textcolor{#23d18b}{\tt{9}}$$ $$\textcolor{#23d18b}{\tt{9}}$$

github-actions[bot] avatar Nov 08 '24 10:11 github-actions[bot]