arclight icon indicating copy to clipboard operation
arclight copied to clipboard

Single dates exported from ArchivesSpace do not display

Open gwiedeman opened this issue 4 years ago • 7 comments

By default ArchivesSpace permits three date types, inclusive, bulk, and single (and you can add more local types). Only inclusive and bulk are valid in EAD2002, so single dates export to EAD without a @ type.

During indexing, this method parses unitdates and adds them to normalized_date which is used for display.

Currently, if a component has an inclusive date and a single date, only the inclusive date will be added to normalized_date and the single date will not display to users.

gwiedeman avatar Mar 01 '21 20:03 gwiedeman

Thanks for reporting, @gwiedeman . We noticed this issue, too, at Duke and made local modifications (esp. to the normalize method): https://gitlab.oit.duke.edu/dul-its/dul-arclight/-/blob/develop/lib/dul_arclight/normalized_date.rb https://gitlab.oit.duke.edu/dul-its/dul-arclight/-/blob/develop/spec/lib/dul_arclight/normalized_date_spec.rb

Seems to be working well for us so far, but I imagine there might be variations throughout the community on how dates are treated.

seanaery avatar Mar 01 '21 20:03 seanaery

relates to #1336

gwiedeman avatar Jul 20 '23 15:07 gwiedeman

I have this up in single_dates, and @seanaery 's fix ensures that all <unitdate>s display, even when some have @type and some do not. However, this doesn't always preserve the order of the dates, as some archivists would expect. For example, the fixture I added has a <unitdate> without a @type and then an undated <unitdate> with @type="inclusive" Since Arclight currently indexes dates into individual fields by @type this just appends all the inclusive dates, then dates without @type and bulk dates are appended last. If all your dates have the same @type or no @type the order will be preserved, but not if they're mixed.

Changing this to always preserve date order would require reworking how Arclight handles dates. It is debatable how important the order of unitdates are. They do get exported from ASpace following the same order in the webapp, but it's unclear if EAD is promising to preserve date order. Many archivists do think date order is meaningful and enter dates how they expect is most intuitive for users. Many would expect that date order is maintained.

Relatedly, per #1336, it might be best to eliminate some redundancy in what we are storing and indexing in Solr. If we are maintaining normalized_date_ssm, then normalized_title_ssm might not be necessary and we could just join title_ssm and normalized_date_ssm in the view templates. I do think its probably good to have both individual dates and a display date string in the index.

What I'm thinking might be to have an unitdate_ssim field is a list of all dates in order regardless of @type and a unitdate_label_ssm field that is also a list of labels, like "inclusive" or "bulk" and have empty strings ("") when there is no @type. During indexing these would also be joined into normalized_title_ssm similar to how works now, but with different logic. it seems not ideal to rely on empty strings, but Arclight would probably only use normalized_title_ssm and this would mostly just to preserve the individual date data in the index. I might be making this more complicated and there's a better way to store this in Solr.

We could also implement @seanaery 's fix, permanently and not care about order, or temporarily just to ensure everything displays for now and make this into another ticket.

gwiedeman avatar Dec 13 '23 18:12 gwiedeman

Thanks @gwiedeman . The two links to Duke code I had shared on this thread back in 2021 for this are now broken, but the code now lives here: https://gitlab.oit.duke.edu/dul-its/dul-arclight/-/blob/main/lib/traject/dul_arclight/normalized_date.rb https://gitlab.oit.duke.edu/dul-its/dul-arclight/-/blob/main/spec/lib/traject/dul_arclight/normalized_date_spec.rb

From a non-archivist perspective :-) ... what you propose (splitting to two fields, one for the dates and one for the labels) makes good sense to me. And I am always on board with removing unnecessary redundancy.

In the interest of bringing the community sprint in for a landing, you could PR your current branch to get an immediate fix in for the missing dates bug. Then pursue the enhanced refactor for better sequencing as a followup line of work. So if that effort extends beyond this sprint for whatever reason, we will still have made improvements for the impending release.

seanaery avatar Dec 13 '23 19:12 seanaery

Unless we can confirm that EAD will preserve unit date order, it seems important from both an archivist and user perspective to maintain the unit date order, even if that means a different effort to change how ArcLight handles dates.

dinahhandel avatar Dec 13 '23 20:12 dinahhandel

I agree that preserving date order is important. While "bulk" dates should appear last, single (no "type" attribute) and "inclusive" dates can appear in any order in a series of dates e.g. 2002-2003, 2021 vs. 2002, 2020-2021 vs. 2002-2021 (bulk 2020-2021). And I can see "undated"/"n.d" date expressions causing trouble, depending on how/if normalized dates are supplied

mmmmcode avatar Dec 13 '23 21:12 mmmmcode

Okay I'm PR-ing the quick fix for now but also leaving this open. I think in practice ASpace (and most tools) to Arclight will preserve <unitdate> order, but I think XML doesn't promise that so some tools might not.

Agreed that bulk dates and undated are usually last. Just preserving the date order should allow archivists to set that order in ASpace or whatever tool.

gwiedeman avatar Dec 13 '23 22:12 gwiedeman