odd-platform icon indicating copy to clipboard operation
odd-platform copied to clipboard

Sync in the Data Entitties is not happening after the cross namespace recovery

Open mavenzer opened this issue 1 year ago • 1 comments

To test the resilience of our deployment patterns we have deployed ODD in many namespaces in Kube Cluster. So to validate the recovery from one namespace to another (say from Dev to Production) we have used cronjob as the backup mechanism. Since we are using Bitnami Postgres as the database its pretty easy to write the cronjob from the database and test it out.

PGPASSWORD=$POSTGRES_PASSWORD pg_dumpall -U $POSTGRES_USER -h $POSTGRES_HOST -p $POSTGRES_PORT > /backups/all-backups-$TIMESTAMP.sql

We can easily do the recovery in the same namespace but the problem starts with the different namesapce(From DEV to PROD) i.e. we wanted to recover the data from the dev namespace to prod namespace using the .sql file from the dev namespace.

Steps which we have followed for the recovery :

  • Scaled down the ODD collector and ODD Platform
  • Copied the .sql file from DEV to PROD
  • Recovered the data using PSQL shell.
  • Updated the data-source in the collector
UPDATE data_source
SET oddrn = REPLACE(oddrn, 'odd-postgresql.dev-cluster.svc.cluster.local', 'odd-postgresql.prod-cluster.svc.cluster.local')
WHERE namespace_id = 1;
  • Added the same token in the Collector POD which was in the Dev Namespace.
  • Started the ODD platform and ODD collector

So what we have found is exact 2x number of data entities in the ODD platform, because there are two copies of data entities one of the DEV NS and one for the PROD ns. And the data is there for the dev ns not for the Prod ns.

data-prod data-dev

mavenzer avatar May 29 '24 17:05 mavenzer

Hi, ODDRN is unique identifier for data_entity, data_source, dataset_field (More info) So, in that case, updating the data_soruce table won't be enough; you also need to update these tables:

  • public.data_entity;
  • public.dataset_field;
  • public.dataset_version;
  • public.data_entity_task_run;
  • public.erd_relationship_details;
  • public.relationships;
  • public.metric_entity;
  • public.data_quality_test_relations;
  • public.data_entity_task_last_run;
  • public.alert;
  • public.lineage;

During the next collector ingest, vectors for these data entities will be updated in public.search_entrypoint (P.S. In case some entities will not be part of the ingest, facet search for them could not be working.)

NOTE: You need to perform all these changes before Collector launches.

Vladysl avatar May 30 '24 13:05 Vladysl