atom icon indicating copy to clipboard operation
atom copied to clipboard

Problem: Accessions get duplicate fields when modified by a CSV import

Open amayita opened this issue 1 year ago • 2 comments

Current Behavior

Steps to reproduce the behavior

  1. Importing just the example accessions CSV listed here, unchanged, into a test server, several times.

  2. Fileds such as eventDates and alternativeIdentifiers fields will be doubled, tripled (and so on) which each new import.

According to the AtoM documentation, existing accession number data should be overwritten, but that's not happening for these two fields. But then we found this section in the documentation here, that says:

There are also additional fields that are not stored in AtoM's primary accesion record database tables that can potentially receive new data via an update import. In these cases, existing data will not be replaced - instead, the update import will append new data to the existing resources. These fields typically include related entities such as donors, creators and event dates, physical storage, as well as alternative identifiers.

This might explain why the eventDates and alternativeIdentifiers fields are being doubled and not replaced.

Expected Behavior

This should be fixed, or there should be a way to overwrite these fields without duplication.

Possible Solution

No response

Context and Notes

No response

Version used

AtoM 2.7.3

Operating System and version

Ubuntu 20

Default installation culture

en

PHP version

PHP 7.4

Contact details

No response

amayita avatar Nov 21 '23 17:11 amayita

Hi!

The reason for this is, I suspect, exactly the one that you found, in the docs and it's not an easy fix.

Honestly, the importing to overwrite behavior with the same Accession Number was a discovered happy accident, not a feature that was designed for support. In terms of development, we only ever officially added update import support for authorities, descriptions, and possibly repository records.

I probably should never have documented it as-is, but wanted to give users without any other recourse an option for making import updates. That said, I don't think "fixing" this will be easy or quick, and there are likely many other bugs to be prioritized instead, in my opinion.

If you really think something should be done about this, I would recommend instead that we handle this by updating the docs: adding WARNINGS to the documentation about this being an experimental unsupported feature, and listing more explicitly which fields can and can't be updated this way (based on an analyst doing some testing) for example.

fiver-watson avatar Nov 21 '23 18:11 fiver-watson

Update to add: on review of the documentation link provided, I think it is pretty explicit as-is. There is a list of fields that can be overwritten via import, and a separate list that can be added to, but will not overwrite. All the fields you list are in the second list. The feature is working exactly as described in the docs.

I would call this more of a Wish List enhancement item to make all CSV fields for all entities supportable for update imports, rather than a problem / bug report.

Given that, it's also a much bigger request , which is why support for this has not already bee added. Additionally, there are many known issues with matching logic in other entities for example (so at minimum roundtripping needs UI support); there is no roundtripping for other entities; delete and replace doesn't work with hierarchies nor does it preserve relations, making it basically useless and worthy of consideration for removal; etc etc. Just some thoughts!

fiver-watson avatar Nov 22 '23 17:11 fiver-watson