metafacture-core icon indicating copy to clipboard operation
metafacture-core copied to clipboard

Bug `encode-csv` with two value csv

Open TobiasNx opened this issue 11 months ago • 2 comments

In my example here: https://github.com/TobiasNx/metafacture_workflows/commit/16308bc44ab961f3beaeef0497479e1124aedc09

The outputted csv seems to have sometimes mixed up the columns. This seems to be due to order of the incoming stream:

Hochschulbibliothek Pforzheim, Bereichsbibliothek Technik und Wirtschaft	http://lobid.org/organisations/DE-951#!
http://lobid.org/organisations/DE-1a#!	Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße
Hochschularchiv der ETH Zürich	http://lobid.org/organisations/CH-001807-7#!
Heimatgeschichtliches Museum Modautal	http://lobid.org/organisations/DE-MUS-265910#!
Museum Johannes Reuchlin MJR	http://lobid.org/organisations/DE-MUS-492617#!

If I output the json, the issue seem to be created by a variation in the output order:

{
  "name" : "früher: Frankfurt/Main; Institut für Rechtsgeschichte, Bibliothek",
  "id" : "http://lobid.org/organisations/DE-30-163#!"
}
{
  "name" : "Hochschulbibliothek Pforzheim, Bereichsbibliothek Technik und Wirtschaft",
  "id" : "http://lobid.org/organisations/DE-951#!"
}
{
  "id" : "http://lobid.org/organisations/DE-1a#!",
  "name" : "Staatsbibliothek zu Berlin - Preußischer Kulturbesitz, Haus Potsdamer Straße"
}
{
  "name" : "Hochschularchiv der ETH Zürich",
  "id" : "http://lobid.org/organisations/CH-001807-7#!"
}
{
  "name" : "Heimatgeschichtliches Museum Modautal",
  "id" : "http://lobid.org/organisations/DE-MUS-265910#!"
}

TobiasNx avatar Jul 17 '23 13:07 TobiasNx

Currently, the CSV encoder writes literals (values) as they come in, without giving any regard to their names. Hence, if the input order is unstable, the output will be inconsistent.

A potential solution might be to write values in the order they were first received, which is also the order of the column headers. But this will get somewhat complicated when also taking repeated fields into account.

blackwinter avatar Jul 28 '23 10:07 blackwinter

Task: map incoming data to header order, add new row in header, if element does not exist.

TobiasNx avatar Sep 21 '23 14:09 TobiasNx