Spine-Toolbox icon indicating copy to clipboard operation
Spine-Toolbox copied to clipboard

Import editor: link the source with a mapping and an alternative

Open spine-o-bot opened this issue 4 years ago • 6 comments

In GitLab by @jkiviluo on Feb 3, 2021, 14:46

Users may want to change their source data tables and they may not be the same in different local machines. There should also be more flexibility in connecting a specific source(table) to a specific mapping and also consider alternatives in a flexible manner (sometimes they are part of data, sometimes not). Here's my proposal discussed with @soininen. @manuelma is thinking about implementation.

  • Mappings need to be independent of data sources (already achieved, but linking to sources will be improved).
  • CSV files are single table source files. Excel and most others are multi-table source files. Table is the common nominator.
  • A single Excel sheet can contain multiple 'tables' (hence multiple mapping may be required --> already achieved, but is considered here too)
  • Often it can be that a whole file or a whole table should go to a specific alternative (multiple versions of the same file). But sometimes the table data contains the alternative information - e.g. as a dedicated column.
  • We need to link the source table to a specific mapping and to a specific alternative in the import editor.
  • We shouldn't make things harder for the user
    • Typically the mapping is for a sheet - the default should be an empty mapping with the same name as the sheet
    • Probably the best default for the alternative is to pick the source name (but you should be able to change that)
  • This allows to link new input data sources that will be recognised (through table/sheet name) or left unrecognised if there is no match
    • In this case, the user will need to link to an existing map or define the new map

There would be a separate table that lists available source files (those linked with the importer). It has two columns - one pointing to the actual file and another to a SOURCE 'label'. If you import a project and the input file is not included, then this sourcefile can be empty and show a warning, but the SOURCE 'label' would still be there helping the user to recover the situation. It also helps to swap input files (you might have multiple versions of the same input file - that's the common way to model when using e.g. Excel files as an input for models).

The selection would then have four columns.

  • 'SOURCE' would indicate the source through the label. It's a list based on the available sourcefiles.
  • 'SHEET' would indicate the sheet in that file (a list based on the tables available in the SOURCE - a bit of thinking required what to do when the sourcefile has been lost).
  • 'MAPPING' column would also be a list. Mapping would be based on the list of mappings (and empty mappings are created automatically if there is no mapping for the sheet/table).
  • 'ALTERNATIVE' could be named by the user or then it could be [none], [source file name], or [set in table]. The latter means that the mapping needs to include a mapping to show where the alternative names are in the data table.
Sourcefiles SOURCE    
input_v1.xlsx input_v1    
input_v2.xlsx input_v2    
input_v3.xlsx input_v3    
       
       
SOURCE TABLE MAP ALTERNATIVE
input_v1 units units [none]
input_v1 nodes nodes [source name]
input_v2 nodes nodes alt_2
input_v3 nodes wrong_nodes [none]

Antti thinks this is mainly playing with the semantics and the interface - the machinery is mostly in place.

Generic exporter could implement something similar.

Related issue (this issue should solve that one too): https://github.com/spine-tools/Spine-Toolbox/issues/1423

spine-o-bot avatar Feb 04 '21 10:02 spine-o-bot

In GitLab by @jkiviluo on Feb 3, 2021, 14:48

changed the description

spine-o-bot avatar Feb 04 '21 10:02 spine-o-bot

In GitLab by @jkiviluo on Feb 3, 2021, 14:48

changed title from Import editor: link {-source with a map and-} alternative to Import editor: link {+the source with a mapping and an+} alternative

spine-o-bot avatar Feb 04 '21 10:02 spine-o-bot

In GitLab by @manuelma on Feb 3, 2021, 19:06

What about the datapackage connector? You're not happy with how it works? It's in place for precisely this purpose.

spine-o-bot avatar Feb 04 '21 10:02 spine-o-bot

I didn't remember it. Which begs the question how do we make stuff more apparent for the user (I think the approach suggested here would do that in this case).

Data packaging also doesn't cover the whole scope of the issue. I'm also suggesting that a very typical use case for alternatives is to have different sources that you want to map to different alternatives. This would enable that. It also more clearly separates the mapping from the source - you could re-use the same mapping much more freely after this.

jkiviluo avatar Feb 05 '21 06:02 jkiviluo

I think there're at least two things mixed in this issue.

  • One is lumping multiple csv together to define only one mapping spec for them. That was already covered or initiated by the datapackage connector. If we don't want to use the datapackage connector I think we should just remove the datapackage functionality we have implemented, as it's becoming almost pointless?
  • Another thing is changing a bit the Import spec interface to sort-of elevate the alternative to a higher level. I don't fully grasp the concept from the description but I guess it's ok to try and see how it works. If possible, please clarify the need and how it is not supplied by the current design?

manuelma avatar Feb 05 '21 08:02 manuelma

If we don't want to use the datapackage connector I think we should just remove the datapackage functionality we have implemented, as it's becoming almost pointless?

I would agree - provided that we get practically the same functionality in this new approach.

If possible, please clarify the need and how it is not supplied by the current design?

Let me put it this way - I'm proposing three 'open' slots where you can switch the contents as needed: a source table slot, a mapping slot and an alternative slot. And there can naturally be multiple rows of these. This gives the freedom to make the choices you happen to need at any particular import event, but still keep everything stored (from which to choose). This new way should also be considerably more intuitive to use. I guess the need is that at the moment it's all bit confusing and it really needs to be more clear. The mappings may be made by an expert but then the regular user should still feel comfortable changing the contents in these slots (they would not be messing anything hard to fix).

Thinking bit further, it would also help to distinguish between something that is stored in a repository (mappings) and something that the user will regularly change.

jkiviluo avatar Feb 17 '21 09:02 jkiviluo