awx-operator icon indicating copy to clipboard operation
awx-operator copied to clipboard

AWX migration to external postgres fails

Open jonathon2nd opened this issue 1 year ago • 11 comments

Please confirm the following

  • [X] I agree to follow this project's code of conduct.
  • [X] I have checked the current issues for duplicates.
  • [X] I understand that the AWX Operator is open source software provided for free and that I might not receive a timely response.

Bug Summary

When attempting to migrate from AWX on vm to AWX in k8s with an externally managed postgres cluster, it fails.

AWX Operator version

awx-operator:2.1.0

AWX version

latest

Kubernetes platform

kubernetes

Kubernetes/Platform version

v1.24.8 +rke2r1

Modifications

no

Steps to reproduce

Deploy

---
apiVersion: awx.ansible.com/v1beta1
kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  secret_key_secret: awx-secret-key
  old_postgres_configuration_secret: awx-old-postgres-configuration
  postgres_configuration_secret: awx-postgres-configuration
  projects_persistence: true
  projects_storage_class: linstor-replica-three
  projects_storage_size: 10Gi
  projects_storage_access_mode: ReadWriteOnce
  admin_password_secret: awx-admin-password
  admin_user: admin
  ingress_annotations: |
    "kubernetes.io/ingress.class": "traefik"
    "traefik.ingress.kubernetes.io/router.entrypoints": "websecure"
    "traefik.ingress.kubernetes.io/router.tls": "true"
  ingress_path: /
  ingress_path_type: Prefix
  ingress_tls_secret: example-com-cf
  hostname: awx.example.com
---
apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: acid-awx-pooler.awx-postgres
  port: "5432"
  database: awx
  username: awx
  password: password
  type: unmanaged
  sslmode: require
type: Opaque
---
apiVersion: v1
kind: Secret
metadata:
  name: awx-old-postgres-configuration
  namespace: awx
stringData:
  host: "ip"
  port: "5432"
  database: awx
  username: awx
  password: password
type: Opaque

Expected results

I expect it to use the already deployed postgres cluster

Actual results

Fails to work. See log file

Which leads me to: https://github.com/ansible/awx-operator/blob/devel/roles/installer/tasks/migrate_data.yml#L31

It seems like when you migrate, it is expecting to manage postgres?

Additional information

https://github.com/ansible/awx-operator/issues/1240

Creating new issue as requested

Operator Logs

awx-operator-controller-manager-58b5b7698b-4c2x6_awx-manager.log

jonathon2nd avatar May 10 '23 21:05 jonathon2nd

I'm encountering the same issue. Trying to migrate from an old awx 17 VM/psql to an Operator-controlled deployment with external psql (because no to databases on k8s...).

awx-operator:2.3.0 old psql: postgres:12 new psql: 14.8 (Ubuntu package)

AWX:

kind: AWX
metadata:
  name: awx
  namespace: awx
spec:
  service_type: ClusterIP
  ingress_type: ingress
  ingress_class_name: nginx
  ingress_annotations: |
    nginx.ingress.kubernetes.io/affinity: "cookie"
    nginx.ingress.kubernetes.io/session-cookie-name: "awx-kube-sticky"
    nginx.ingress.kubernetes.io/session-cookie-expires: "172800"
    nginx.ingress.kubernetes.io/session-cookie-max-age: "172800"
  hostname: awx1.[example.com]
  secret_key_secret: awx-secret-key
  admin_user: admin
  admin_email: administrator@[example.com]
  admin_password_secret: awx-admin-password
  postgres_configuration_secret: awx-postgres-configuration
  ipv6_disabled: true
  image_pull_policy: Always

awx-old-postgres-configuration:

apiVersion: v1
kind: Secret
metadata:
  name: awx-old-postgres-configuration
  namespace: awx
stringData:
  host: "awx.[example.com]"
  port: "5432"
  database: "db"
  username: "user"
  password: "pw"
type: Opaque

awx-postgres-configuration:

apiVersion: v1
kind: Secret
metadata:
  name: awx-postgres-configuration
  namespace: awx
stringData:
  host: postgresql.[example.com]
  port: "5432"
  database: db2
  username: user2
  password: pw2
  sslmode: prefer
  type: unmanaged
type: Opaque

operator log: op-log.txt

As a workaround, does awx/the operator care if I do the migration to a managed DB via a copy of my secret with a different name and set to managed, move that db to my intended db server and then just change the secret back to my current secret, hopefully causing a deployment update?

Tatsu-Kishi avatar Jul 06 '23 09:07 Tatsu-Kishi

@rcarrillocruz I think you got this report from another sources, are you tracking this one?

Zokormazo avatar Aug 16 '23 15:08 Zokormazo

I have manually setup new AWX. After finally getting AWX up and running with external postgres, I attempted another migration and it failed. So just manually setup templates, creds and inventory+project.

jonathon2nd avatar Oct 20 '23 21:10 jonathon2nd

@jonathon2nd I have a AWX in 9.3.0 which I managed to migrate to AWX Operator and run version 23.3.0. I did have to jump thru different versions but I found it easier to have a Bitnami Postgresl deployed and run the database as external for the new AWX, this way I can manage to do a sql import from a dump of the old database.

So, try this:

  • Deploy AWX Operator
  • Deploy Bitnami Postgresl, I have tried 13 to 15 and works
  • Create a Kubernetes Job that pull a dump from the old darabase and imports into the new Bitnami. I have a custom inage that does this for hourly backups of the database and sends to S3 bucket, also have another command to restore from latest or given backup in S3.
  • Once imported sql dump, deploy AWX with the external Bitnami database config for the connection.
  • Watch the migration to see the progress.

This method has work for me multiple times. I even tried using Bitnami Postgresl HA but pgpool doesnt play nicely with django cursor.py.

rchaud avatar Nov 30 '23 02:11 rchaud

This looks similar to what was happening here in the pulp/galaxy operator:

  • https://github.com/pulp/pulp-operator/pull/1027/files

@Zokormazo this what the issue you were thinking of that @rcarrillocruz was working on.

@jonathon2nd The tricky thing here though is that the migration logic makes the assumption that you are migrating to a managed db in a k8s pod:

  • https://github.com/ansible/awx-operator/blob/devel/roles/installer/tasks/migrate_data.yml#L60

I think that to add support for using this migration logic for external postgres instances, we would need to do something like:

  • create a management pod (maybe managed by a k8s job)
  • have that management pod connect to the old postgres instance via psql, then redirect the data stream of the pg_dump to a pg_restore in the new postgres instance.

There are some tricky issues to work through though:

  • You may need to temporarily write the pg_backup to a a tmpfile on the management pod, the cat it when piping to the pg_restore. For large databases, it might use up all of the ephemeral storage on the cluster if a PVC is not used to back this temporary write.
  • You would need to connect to the new pg instance as the postgres user, or pg admin user, with database set to template1 I think.

These are just some thoughts and recollections from previous experimenting, not to be taken as fact. But if you want to take a crack at implementing this, a PR would certainly be welcome.

At the moment, if you migrate clusters and use an external postgres, it is on the user to migrate the data from the old postgres to the new one.

rooftopcellist avatar Dec 01 '23 22:12 rooftopcellist

Is this issue still open?

SwiperNo avatar Apr 03 '24 11:04 SwiperNo

Just tried to install AWX via operator with external postgres. It seems like AWX does not create any tables with external managed postgres. The awx operator creates the migration jobs but in my fresh bootstrapped awx database i have no tables and awx / operator doesn't create any tables!

You should either create the tables you need if the don't exist or expand your documentation with a DB bootstrap file ...

Regards, Kilian

discostur avatar Jun 04 '24 17:06 discostur

I have faced exact same problem yesterday, when just trying to migrate from the built-in managed postgresql database to external unmanaged database. This is because the code is currently written in such a way, that the target for pg_restore is always a managed database inside k8s container where both pg_dump and pg_restore are called.

There are currently 3 issues that would beed to be fixed in order for this to work:

  1. always specify -h parameter (host) for both pg_dump and pg_restore and do not do implicit trust auth against target in pg_restore (simple)
  2. maybe improve handling of credentials, as current method relies on environment variables in postgresql pod (especially old PG password). This can be solved either by re-creating postgresql statefulset (the only way that it adds new env variable after specyfing old_postgres_configuration_secret), or use rendered .pgpass file inside container to do the authentication and remove it afterwards
  3. for the moment either the old or the current postgresql database HAS TO BE a managed database, as all the code runs against a k8s container. I do not see an easy way to handle scenatios when migrating from one unmanaged to another unmanaged database.

In terms of points 1-2 i can help and even provide a PR that will fix it should this be wanted.

Actually when I think about this once again, even point 3 would be doable with additional management pod based on postgresql image having both pg_dump and pg_restore. It might even solve problems 1-2 as well, as it could have all needed env variables since it needs to be created anyway, and it would have to connect to both databases since none of them would be running inside of it.

bartowl avatar Jun 28 '24 07:06 bartowl

Just so I'm clear, are you saying that there is no way to spin up a new AWX instance that uses a new external postgresql install (DB/user/pass created)? I'd have to create a new AWX instance with a managed database, then do a DB migration to the external DB?

stevenahmet avatar Jul 02 '24 06:07 stevenahmet

It seems that you guy are really over complicating things.

Look do this. Create a job or deployment with pgsl in the same namespace as AWX with persistent volume. Open a shell in the container, dump the awx database. Remove awx deployment. Deploy postgres from Bitnami or your favorite. Dont forget to create a secret that awx will use for the external connection. Connect to the container used for the dump, connect to the new bitnami db and restore the db. Create awx with new external db. Everything should be restored as normal.

I have all this process automated to dump my db horly using k8s cronjob and uplod to S3 and for restores I can pull from S3 and restore the db using a job in k8s.

I have only restored 1 or 2 times when the pvc became corrupted at the storage level.

Please reach out if you need some guidance.

rchaud avatar Jul 02 '24 06:07 rchaud

Just so I'm clear, are you saying that there is no way to spin up a new AWX instance that uses a new external postgresql install (DB/user/pass created)? I'd have to create a new AWX instance with a managed database, then do a DB migration to the external DB?

Actually exactly the opposite. It is not possible to migrate to external pg database with the built-in workflow in AWX operator. I think it should be just fine to setup a completely new instance with external postgresql but did not look into that part of code. For sure however having a localy awx-managed postgresql database it is not possible to use the built in migration workflow to migrate to an external database.

And what @chinochao stated is correct - this is what I actyually did. I execed into the running managed postgresql database, manually started pg_dump ... | pg_restore ... to migrate to external database and then updated resource definition to point only to external database. This ticket is however about migration to external database with built-in workflow failing.

As I already said - it should be possible without much effort to migrate between internal and external database in any direction. Just one -h parameter to pg_restore would fix it. In order to be able to migrate between 2 external databases though one would need extra job container and a bit more code changes...

BartOpitz avatar Jul 02 '24 07:07 BartOpitz