arclight icon indicating copy to clipboard operation
arclight copied to clipboard

Finding aids with nested collections index 'successfully', but then cause page crashes

Open mxMiles opened this issue 1 year ago • 2 comments

It is possible to index an ead that has nested collections. This causes page crashes. See collectionInCollection.txt example file.

Expected behavior

I expect either the page to load or the file to not index.

Actual behavior

The indexer will index this file without error. If you navigate to the finding aid or it shows up in search results, the page will not load. The log file says:

I, [2024-05-08T09:51:40.077867 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b] Started GET "/caoSearch/catalog?f%5Blevel%5D%5B%5D=Collection&f%5Brepository%5D%5B%5D=Unicorn+Test+Repository%3A+where+weird+data+comes+to+life%21" for 10.135.171.48 at 2024-05-08 09:51:40 -0400
I, [2024-05-08T09:51:40.078836 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b] Processing by CatalogController#index as HTML
I, [2024-05-08T09:51:40.078904 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   Parameters: {"f"=>{"level"=>["Collection"], "repository"=>["Unicorn Test Repository: where weird data comes to life!"]}}
I, [2024-05-08T09:51:40.209573 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   Rendered vendor/bundle/ruby/3.1.0/bundler/gems/arclight-8852569afb4c/app/views/catalog/index.html.erb within layouts/blacklight (Duration: 64.5ms | Allocations: 48783)
I, [2024-05-08T09:51:40.209702 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   Rendered layout vendor/bundle/ruby/3.1.0/gems/blacklight-8.1.0/app/views/layouts/blacklight.html.erb (Duration: 64.7ms | Allocations: 48812)
I, [2024-05-08T09:51:40.209902 #1584162]  INFO -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b] Completed 500 Internal Server Error in 131ms (ActiveRecord: 1.5ms | Allocations: 90449)
F, [2024-05-08T09:51:40.212721 #1584162] FATAL -- : [f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] ActionView::Template::Error (id must be present for all documents and components):
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     4: <% view_config = local_assigns[:view_config] || blacklight_config&.view_config(document_index_view_type) %>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     5: <div <%= 'id="documents"'.html_safe unless grouped? %> class="al-document-listings documents-<%= view_config&.key || document_index_view_type %>">
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     6:   <% document_presenters = documents.map { |doc| document_presenter(doc) } -%>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     7:   <%= render view_config.document_component.with_collection(document_presenters, partials: view_config.partials, counter_offset: @response&.start || 0) %>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]     8: </div>
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b]   
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) lib/arclight/normalized_id.rb:23:in `normalize'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) lib/arclight/normalized_id.rb:15:in `to_s'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) app/models/arclight/parents.rb:19:in `eadid'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) app/models/arclight/parents.rb:25:in `block in as_parents'
[f9ad2102-0c5f-4ac5-85b5-465bbc29519b] arclight (8852569afb4c) app/models/arclight/parents.rb:25:in `map'

Steps to reproduce

  1. Index attached file.
  2. Navigate to the repository page for that finding aid, or search for it.

mxMiles avatar May 08 '24 14:05 mxMiles

Hi @archivalGrysbok -- we encountered this at Stanford too. You can see our workaround here: https://github.com/sul-dlss/stanford-arclight/blob/7245b905c50f5165cd46a4686bce9e24a3830493/app/models/solr_document.rb#L22 - we secretly interpret any lower-level "collections" as "series".

Is it invalid EAD to have two level="collection" components in the same file? Is this a common thing in real archival data?

It is possible we could apply our Stanford solution to Core. Or, in my opinion having the indexer fail would be preferable to "successful" indexing with broken pages. I think this needs more discussion!

marlo-longley avatar May 08 '24 17:05 marlo-longley

I don't think it's invalid, just not a best practice. ArchivesSpace allows collections within a collection.

I've only seen two repositories in the CAO do it and both were small. We updated their finding aids so they'd index and let the repositories know about the problem, so they could fix it on their end going forward. I think they trying to describe all their collections in a single finding aid.

My concern with secretly calling lower-level collections "series" is that then they wouldn't be findable as collections. Then again, I don't know that anyone is trying to do that. To me, having something findable (even if not entirely as described in the .xml) is preferable to dropping it on the floor.

I'll look into applying the Stanford fix to my test server.

mxMiles avatar May 08 '24 18:05 mxMiles