metacatui
metacatui copied to clipboard
Leverage Solr Graph Queries for package traversal
Explore using Solr's graph queries and streaming expressions to handle retrieving package contents and version history. Implementing this would speed up tasks like resolving current versions and displaying obsolescence chains in MetacatUI. It would allow us to Expose version history of portals, Provide views and tools for exploring Object versions, and Provide navigation for earlier and later object versions
The CN would need to be updated to a newer version of Solr.
See https://datadavev.github.io/solr-property-graph/graph_traversal01.html
The graph query type in solr enables traversal of a graph of records where edges of the graph are defined by two fields: FROM which indicates values used to identify other nodes, and TO which indicates the field values of FROM are to match. This traversal is applied recursively with a starting set of documents defined by a query. Basically:
{!graph from=FROM to=TO}STARTING_SET
When an OAI-ORE resource map is indexed in DataONE, three solr fields are populated: resourceMap, documents, and documentedBy. For all documents aggregated by an OAI-ORE resource map, the resourceMap property is populated with the identifier of the aggregating resource map. We can use the resourceMap property to navigate the graph of items in a resource map and any linked resource maps.
Given a PID for a metadata document, we can find the containing resource map document by examining the resourceMap field of the metadata document in the index. Like:
{!graph from=id to=resourceMap}id:PID
Obsoleted resource maps can be excluded by including a traversalFilter (the *:* is required because a negating query does weird stuff by itself):
{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}id:PID
That gives us the non-obsoleted resource map containing PID. Then we can use that as the starting set and reverse the traversal to find all documents in the resource map:
{!graph to=id from=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}
{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}id:PID
An actual example is:
{!graph to=id from=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)'}{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)' returnRoot=false}(id:"urn%3Auuid%3A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d")
Expressed as a URL (574 matches):
https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=%7B%21graph+to%3Did+from%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27%7D%7B%21graph+from%3Did+to%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27+returnRoot%3Dfalse%7D%28id%3A%22urn%253Auuid%253A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d%22%29&wt=json&fl=id%2CformatId%2CresourceMap%2CobsoletedBy%2Cobsoletes&rows=100&start=0
The depth of traversal can be limited by specifying a maxDepth . For example, in the above query, we can limit to only the top level of the resource map hierarchy:
{!graph to=id from=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)' maxDepth=1}
{!graph from=id to=resourceMap traversalFilter='(*:* AND -obsoletedBy:*)' returnRoot=false}
(id:"urn%3Auuid%3A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d")
As a URL (36 results):
https://arcticdata.io/metacat/d1/mn/v2/query/solr/?q=%7B%21graph+to%3Did+from%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27+maxDepth%3D1%7D%7B%21graph+from%3Did+to%3DresourceMap+traversalFilter%3D%27%28%2A%3A%2A+AND+-obsoletedBy%3A%2A%29%27+returnRoot%3Dfalse%7D%28id%3A%22urn%253Auuid%253A1ee20ad0-9fec-4b24-bd0d-d0d55aefa49d%22%29&wt=json&fl=id%2CformatId%2CresourceMap%2CobsoletedBy%2Cobsoletes&rows=100&start=0
Just for kicks - here's a quick viz of the above. Red = ORE docs, green = metadata, blue = other stuff
notebook: https://gist.github.com/datadavev/de006d87eaa25847c260081164c7c398