Use solr's graph query builder for nested collection queries
This can replace the expensive nested indexing using samvera-nesting_indexer and all of the special handling for it.
Note: the graph query builder does not work with multi-sharded solr instances
All nesting indexing has been removed along with unused methods/parameters. Deprecation notices will be added to a future backport PR to 3.x.
This should also fix #5596, #5595, and #5492.
This sounds great! Has there been any performance testing with meaningful/large hyrax data sets? Would perhaps using the traversalFilter graph query option provide any benefit?
this looks great to me.
since the performance of the old implementation is extremely bad, it seems so likely that the performance will be improved. i'd like to suggest we move forward with this and leave any testing for post merge.
i'd also like to look at a limited backport of this, keeping the old indexing behavior in place for 3.x by default, but introducing the new query behavior. i'll poke at this later today.
@dlpierce I've done some very limited manual testing and haven't seen any issues with solr. I'm not sure if the traversalFilter is useful for our use case or not. I'd be curious about performance as nesting depth scales but I'd suspect solr would handle most normal cases easily.
What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users.
What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users.
I would think a note on this page https://samvera.github.io/service-stack.html (which could stand some expansion and more prominant linking in the readme, but that's a different ticket) and a comment in app/search_builders/hyrax/dashboard/nested_collections_search_builder.rb when we talk about the graph search. (because that's where things will break if you multishard).
In a perfect world, we would throw an exception or send a notice or something on startup if multiple shards are detected, but I don't think we even mandate solr cloud usage, so I'm not sure how that API call would be reliable and again, I feel like "Check if Solr configuration is sane on start up" is a whole new ticket.