arclight icon indicating copy to clipboard operation
arclight copied to clipboard

Why does normalized_date exist?

Open jcoyne opened this issue 2 years ago • 6 comments

I see we copy normalized_date_ssm to the date_sort field, but aside from that it doesn't appear to be used. Do we need it?

jcoyne avatar Nov 18 '22 18:11 jcoyne

@jcoyne Recently we've seen that normalized_date_ssm is appended to the normalized_title and guessed that this was desired by archivists/people working with historical content. We allow it to be customized now via #1344. Not sure if that answers!

marlo-longley avatar Dec 01 '22 22:12 marlo-longley

I see that, but does it need to be stored in its own solr field?

jcoyne avatar Dec 02 '22 02:12 jcoyne

I think the normalized date is useful to store in its own Solr field for downstream applications to use if desired even if it is smushed into the normalized title by default in core. A couple examples I know of...

UAlbany has a nice collection list on a "repository" show page that splits the date out from the title: e.g., https://archives.albany.edu/description/repositories/apap and https://github.com/UAlbanyArchives/arclight-UAlbany/blob/main/app/views/arclight/repositories/show.html.erb#L45

Duke has a CSV exporter from bookmarks that we use to make digitization work orders and isolating the date helps us map date metadata from archival components to digitized object metadata for our digital repository.

seanaery avatar Dec 02 '22 14:12 seanaery

@seanaery I think it would be better if we could either put those full features in Arclight or completely remove them from Arclight. Currently these features are sort of hanging between two worlds.

jcoyne avatar Dec 02 '22 15:12 jcoyne

@jcoyne That's a fair assessment and I appreciate that you're thinking about only storing in Solr the data that the core application expects to use. We are flexible at Duke and would just extend the traject rules locally to capture the normalized date atomically/additionally in its own Solr field if that disappears from core.

Caveat: I'm not an archivist. But the way I understand it is, this was developed as it is because proper titles of archival components (collections, too, but especially components) are often generic and repeated within a collection ("Letters," "Correspondence," "Newspaper Clippings"). Appending the dates in the places where the title typically appears gives the described entity some valuable distinction and context, e.g., in the html page <title>, in a list of sibling components, etc.

seanaery avatar Dec 02 '22 18:12 seanaery

There is duplication here and I think normalized_title_ssm is actually the field that is unnecessary. Yeah, they'll be lots of components like "Minutes, 1990" and "Minutes, 1991" that are distinguished by date and Arclight currently uses normalized_title_ssm to display this, but I feel like title in date could just be appended in the template, no? That way, both the title and date are still stored in a structured way in Solr. This would more easily facilitate #292, which is the dream.

I believe the reason for normalized_title_ssm if you're searching "minutes 1991" in this case, but I'm not sure if including that in the index as a distinct field actually aids relevancy. If it does, then I probably shouldn't have to be stored.

The downside to this is that more logic would have to be in the template to handle date types like inclusive and bulk dates. But to me it makes sense to have well-structured data in Solr and have that logic be in the template rather than the data harvesting pipeline.

I'm also not sure that "normalized" is the best descriptor here for what Arclight is doing. Typically archivists use two date fields, a required well-structured date and what ASpace calls a date "expression" that is optional. That way you can have a publication that has a displayed date expression of "Fall 2002" that also has a date like "2002-09" for sorting. Prior to Arclight 1.0 at lease, Arclight used unitdate_ssm as a list of well-structured dates and normalized_date_ssm for the date expression.

gwiedeman avatar Jul 20 '23 13:07 gwiedeman