fhir-river icon indicating copy to clipboard operation
fhir-river copied to clipboard

Changing source table name on the datalake imposes to update the whole mapping

Open nriss opened this issue 3 years ago • 4 comments

Description

I started to map a mssql database for croissant project, and quickly saw that the database schema needed preprocessing before making the mapping on pyrog. After some discussion, we decided to implement dbt and airbyte.

The goal of airbyte is to duplicate the database source from mssql to a postgres database. It will then change all table names by adding a prefix. The next step is to use DBT to create table views from custom sql requests.

Problem

The problem is that I mapped the mssql database on pyrog, and I have to adapt the whole mapping because the table names changed (from mssql prod database to the datalake in postgres).

Solution

The first solution is to adapt the whole mapping which is really time consuming

Another solution is to export the mapping, search and replace the old_table_names to the new_table_names, bit it imposes other difficulties : If I launch an ETL on the imported mapping, the resources will be duplicated because the ids will change.

So, I am wondering if this case will happen often or not, I already had the case with the osiris project where I had to adapt the mapping to other centre.

Another usecase : when using dbt, the created tables will take the name of the sql file. When doing a mapping on these tables, we cannot change the name once defined and mapped.

Experience

I exported the mapping, replaced the table names, reset the FHIR database and reimported it. I forgot some tables (like tables only used on join) and lose time to adapt it.

Discussed with @MiskoG and @Jasopaum

nriss avatar Jul 02 '21 16:07 nriss

Thanks @nriss for taking the time to write things down here. First of all, having to re-do things manually as you did especially in the current Pyrog interface must be a pain 😬

Now, in your story I think that there is food for thoughts

(1) We should be able to empty a FHIR datawarehouse easily. It seems a pretty common thing to me in a data product (for instance in Airbyte while maintaining a A -> B pipeline, they let you "reset" your data in your B destination).

image

To me it's quite a no brainer, and we could have this button in our new Pyrog interface (in the section dealing with batches). @simonvadee If my memory is not bad I remember you talking about this topic with @elsiehoffet-94.

(2) About the fact of having to remap all your FHIR objects because the names of the source tables have changed, I don't see any better option. Using the export/import mapping feature, and thus modifying the names directly in a flat file as you did looks better to me, rather than redo the whole mapping from scratch in the interface.

To sum up my opinion is that (1) we should be able to empty the FHIR datawarehouse easily from Pyrog - allowing you to run a new ETL from scratch (2) the import/export feature MUST work perfectly (if it is not the case let us know) but I don't see any better option when it happens that the source data model changes. I would call this issue "adapting the data connector" and we all hope this is not a very frequent task..

MiskoG avatar Jul 06 '21 16:07 MiskoG

I'm catching up with this issue!

(1) we should be able to empty the FHIR datawarehouse easily from Pyrog - allowing you to run a new ETL from scratch

Ok, although it should be done with care! Did we get anything started regarding this?

(2) the import/export feature MUST work perfectly (if it is not the case let us know) but I don't see any better option when it happens that the source data model changes. I would call this issue "adapting the data connector" and we all hope this is not a very frequent task..

I agree, but this also raises a lot of food for thought. What about the case when we ran the whole ETL, it took quite a long time, and we have to change only one table name. Or if, while running live, a table name changes. If we change the mapping and import it, it will probably break all existing references. Should we delete everything and start over ?

elsiehoffet-94 avatar Aug 26 '21 10:08 elsiehoffet-94

We could do the same as for feeding the FHIR DWH with ressources : we can choose to process all ressources or only a fraction of them. Maybe we can do the same with deleting those ressources (delete all of them or only one for instance). Don't know if there could still be some reference issues though.

MiskoG avatar Aug 26 '21 15:08 MiskoG

(1) I'm just asking another question: if we delete a source or a mapping on pyrog, what do you think of deleting the corresponding data from the FHIR dwh ? Not sure it is pertinent to keep the generated instances if we delete the source / mapping from pyrog

Moreover, it is difficult to delete these resources manually if we have deleted it from pyrog (loss of the source and resource id)

(2) Its a pain for now to export the mapping, delete the concerned resources from the fhir dwh, modify the mapping by hand and reimport it... Hope it will be easier

nriss avatar Aug 30 '21 14:08 nriss