hyrax icon indicating copy to clipboard operation
hyrax copied to clipboard

Use solr's graph query builder for nested collection queries

Open cjcolvar opened this issue 3 years ago • 6 comments

This can replace the expensive nested indexing using samvera-nesting_indexer and all of the special handling for it.

Note: the graph query builder does not work with multi-sharded solr instances

All nesting indexing has been removed along with unused methods/parameters. Deprecation notices will be added to a future backport PR to 3.x.

cjcolvar avatar Sep 09 '22 21:09 cjcolvar

This should also fix #5596, #5595, and #5492.

cjcolvar avatar Sep 12 '22 15:09 cjcolvar

This sounds great! Has there been any performance testing with meaningful/large hyrax data sets? Would perhaps using the traversalFilter graph query option provide any benefit?

dlpierce avatar Sep 12 '22 17:09 dlpierce

this looks great to me.

since the performance of the old implementation is extremely bad, it seems so likely that the performance will be improved. i'd like to suggest we move forward with this and leave any testing for post merge.

i'd also like to look at a limited backport of this, keeping the old indexing behavior in place for 3.x by default, but introducing the new query behavior. i'll poke at this later today.

no-reply avatar Sep 12 '22 18:09 no-reply

@dlpierce I've done some very limited manual testing and haven't seen any issues with solr. I'm not sure if the traversalFilter is useful for our use case or not. I'd be curious about performance as nesting depth scales but I'd suspect solr would handle most normal cases easily.

cjcolvar avatar Sep 12 '22 19:09 cjcolvar

What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users.

no-reply avatar Sep 12 '22 21:09 no-reply

What's the right way to document that Hyrax 4.0 doesn't support sharded SolrCloud? it seems like we need to write that down somewhere, even if it's unlikely to impact any current users.

I would think a note on this page https://samvera.github.io/service-stack.html (which could stand some expansion and more prominant linking in the readme, but that's a different ticket) and a comment in app/search_builders/hyrax/dashboard/nested_collections_search_builder.rb when we talk about the graph search. (because that's where things will break if you multishard).

In a perfect world, we would throw an exception or send a notice or something on startup if multiple shards are detected, but I don't think we even mandate solr cloud usage, so I'm not sure how that API call would be reliable and again, I feel like "Check if Solr configuration is sane on start up" is a whole new ticket.

orangewolf avatar Sep 13 '22 04:09 orangewolf