metacatui icon indicating copy to clipboard operation
metacatui copied to clipboard

Leverage Solr Graph Queries for package traversal

Open robyngit opened this issue 8 months ago • 2 comments

Explore using Solr's graph queries and streaming expressions to handle retrieving package contents and version history. Implementing this would speed up tasks like resolving current versions and displaying obsolescence chains in MetacatUI. It would allow us to Expose version history of portals, Provide views and tools for exploring Object versions, and Provide navigation for earlier and later object versions

The CN would need to be updated to a newer version of Solr.

See https://datadavev.github.io/solr-property-graph/graph_traversal01.html

robyngit avatar Mar 26 '25 21:03 robyngit

The graph query type in solr enables traversal of a graph of records where edges of the graph are defined by two fields: FROM which indicates values used to identify other nodes, and TO which indicates the field values of FROM are to match. This traversal is applied recursively with a starting set of documents defined by a query. Basically:

{!graph from=FROM to=TO}STARTING_SET

When an OAI-ORE resource map is indexed in DataONE, three solr fields are populated: resourceMap, documents, and documentedBy. For all documents aggregated by an OAI-ORE resource map, the resourceMap property is populated with the identifier of the aggregating resource map. We can use the resourceMap property to navigate the graph of items in a resource map and any linked resource maps.

Given a PID for a metadata document, we can find the containing resource map document by examining the resourceMap field of the metadata document in the index. Like:

{!graph from=id to=resourceMap}id:PID

Obsoleted resource maps can be excluded by including a traversalFilter (the *:* is required because a negating query does weird stuff by itself):

{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}id:PID

That gives us the non-obsoleted resource map containing PID. Then we can use that as the starting set and reverse the traversal to find all documents in the resource map:

{!graph to=id from=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}
{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}id:PID

An actual example is:

{!graph to=id from=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)' returnRoot=false}(id:"urn%3Auuid%3A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d")

Expressed as a URL (574 matches):

https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=%7B%21graph+to%3Did+from%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27%7D%7B%21graph+from%3Did+to%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27+returnRoot%3Dfalse%7D%28id%3A%22urn%253Auuid%253A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d%22%29&wt=json&fl=id%2CformatId%2CresourceMap%2CobsoletedBy%2Cobsoletes&rows=100&start=0

The depth of traversal can be limited by specifying a maxDepth . For example, in the above query, we can limit to only the top level of the resource map hierarchy:

{!graph to=id from=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)' maxDepth=1}
{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)' returnRoot=false}
(id:"urn%3Auuid%3A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d")

As a URL (36 results):

https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=%7B%21graph+to%3Did+from%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27+maxDepth%3D1%7D%7B%21graph+from%3Did+to%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27+returnRoot%3Dfalse%7D%28id%3A%22urn%253Auuid%253A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d%22%29&wt=json&fl=id%2CformatId%2CresourceMap%2CobsoletedBy%2Cobsoletes&rows=100&start=0

datadavev avatar Mar 27 '25 18:03 datadavev

Just for kicks - here's a quick viz of the above. Red = ORE docs, green = metadata, blue = other stuff Image

notebook: https://gist.github.com/datadavev/de006d87eaa25847c260081164c7c398

datadavev avatar Mar 27 '25 20:03 datadavev