Spine-Toolbox icon indicating copy to clipboard operation
Spine-Toolbox copied to clipboard

Transformer between importer and data store

Open jkiviluo opened this issue 3 years ago • 12 comments

Transformers do not support transformations between an importer and Spine data store. That would be nice.

jkiviluo avatar Mar 17 '21 22:03 jkiviluo

Looks like I broke Data transformer with my latest changes. Should be fixed now --- at least when connected between Data store and Exporter. Making transformer to work between importer and Data store is a completely different story. While waiting for that to be resolved, you can

  1. Import data to a temporary database and connect that database to the actual database via a data transformer
  2. Write a Tool script that does the needed transformations to the source data before feeding it to importer.

soininen avatar Mar 18 '21 06:03 soininen

In my particular case, I can do the transformation between DB and exporter. I will change the issue name to Importer - Data Store.

jkiviluo avatar Mar 18 '21 07:03 jkiviluo

Thanks for updating the title and description. The actual feature request here is now much clearer.

There are two ways I can think this could be done:

  1. Apply transformation at import time in import_mappings. How feasible this is and how much it would complicate the API need to be investigated
  2. Apply transformations after import inplace, i.e. import data as-is, then transform the data within the database. This would be nice in that we could do these transformations to any existing database at any time. Problems might arise with name clashes at import, though. No idea if this is even feasible.

soininen avatar Mar 18 '21 07:03 soininen

Inplace transformation is actually already doable: just connect two data stores pointing to the same database via a Transformer. Case solved.

soininen avatar Mar 18 '21 07:03 soininen

I wouldn't put a high priority to this. Quite ok functionality can be achieved by having a transformer between two data stores, which is supported.

jkiviluo avatar Mar 18 '21 07:03 jkiviluo

And your solution is even nicer. Although how does it play with DAG order?

jkiviluo avatar Mar 18 '21 07:03 jkiviluo

(although how does it play with DAG order)?

You have two data stores using the same database. That plays very well with the DAG.

soininen avatar Mar 18 '21 07:03 soininen

Ok, right. I thought you meant that there would be a small loop from DS to transformer and back to DS.

jkiviluo avatar Mar 18 '21 07:03 jkiviluo

How about DT advertises an in-memory database backwards?

Importer -> DT -> DS

Importer would import data into the in-memory db, DT would apply the 'transformation filter' on that db, and DS would merge that db into it's own physical db.

That could work if in-memory dbs were shareable by URL, but they are only shared by 'connection instance'...

manuelma avatar Mar 18 '21 07:03 manuelma

That could work if in-memory dbs were shareable by URL, but they are only shared by 'connection instance'...

Indeed, makes them unusable in many scenarios unfortunately.

soininen avatar Mar 18 '21 07:03 soininen

I don't know, there might be a way... The double DS pointing to the same URL solution is good, but might be a little bit too clever, don't you think?

On the other hand, Importer -> DT -> DS seems logical. It's only an implementation detail from our part that prevents it to work, right? (that we only share stuff by url)

manuelma avatar Mar 18 '21 08:03 manuelma

It's only an implementation detail from our part that prevents it to work, right? (that we only share stuff by url)

Right. We could make Importer -> DT -> DS work with URLs for example if DT passed Importer the DS's URL with some clever write-to-temporary-alternative filter. Importer would then write to that alternative. When DT's execution came it would transform the data from the special alternative inplace.

soininen avatar Mar 18 '21 11:03 soininen