atom icon indicating copy to clipboard operation
atom copied to clipboard

Problem: Duplicate "Level of description" terms created during CSV imports

Open alejandroarturom opened this issue 5 months ago • 2 comments

Current Behavior

Steps to reproduce the behavior

  1. Import a CSV with "Level of description" values that differ in capitalization/spelling from the canonical terms (e.g. Unidad Documental Simple instead of Unidad documental simple).
  2. After import, check the "Levels of description" taxonomy (id=34 in my DB).
  3. Duplicate terms are created:
    • Unidad Documental Simple (new, ID 5027)
    • Unidad Documental Compuesta (new, ID 5023)

Some descriptions (143 in my case) were assigned to the duplicate term instead of the canonical ones (ID 241 = File, ID 242 = Item).

Expected Behavior

During CSV import, AtoM should match the provided values against the existing terms in the taxonomy (case-insensitive) and reuse them, instead of creating new duplicates for minor differences in capitalization or spelling.

Possible Solution

  • Normalize term matching during import (ignore case, possibly accent-insensitive).
  • Provide a clear warning in logs or in the UI when an incoming value does not match any existing term.
  • Alternatively, allow admins to enforce "strict matching only" to avoid silent creation of new terms.

Context and Notes

No response

Version used

Atom 2.9

Operating System and version

Ubuntu 22.04

Default installation culture

es

PHP version

8.3

Contact details

[email protected]

alejandroarturom avatar Sep 26 '25 02:09 alejandroarturom

This is an issue my organization has run into a number of times as well. Another possible solution I'd like to add to your list is to add another case to the CSV Validator that outputs a warning in the case that you have a term that isn't an exact match to another term, but does match the term case-insensitively. This way there'd be no need to have to change the way AtoM works when importing terms, since other institutions might already rely on the comparisons being case sensitive (unlikely, but possible).

For example, if your canonical term was "Unidad documental simple" but your CSV contained "Unidad Documental Simple" in the levelOfDescription, a warning would be written during validation that there is a similar term already in the database.

danloveg avatar Sep 26 '25 13:09 danloveg

I have this problem during XML imports, too. And it not only creates duplicates if the the upper/lower case doesn't mach, but it created duplicate entries in Levels of desription taxonomy even when the case MATCHED! So for example I had two, exactly the same way spelled "fond" entry in my L.o.d. taxonomy, one assigned for the newly imported records, the other for every older record of that level.

It seems to be a variation by the same bug, so probably it's not .csv specific.

Starcos avatar Dec 10 '25 14:12 Starcos