platform icon indicating copy to clipboard operation
platform copied to clipboard

Add a graph search API and default implementation

Open aeberhart opened this issue 4 years ago • 6 comments

We already have the following APIs:

  • [SearchResult] search(search, limit) - finds results matching the given keyword search
  • Object traverse(db, table, pk, prop) - traverses the foreign key prop from the given context
  • [Origin] incoming(db, table, pk, offset, limit) - finds incoming relationship links

Search and incoming can handle results from different tables ,by returning the object IDs as dj/db/table/pk. This allows the UI to generate the appropriate hyperlinks.

The next logical step would be to allow for path queries:

[Path] pathQuery(db, table, pk, query, limit)
  • a path query needs a starting context that -as always - is specified via db/table/pk
  • the query can be an arbitrary query string - we have the same approach for "normal" structured queries, where DJ does not limit you to a specific query language
  • similar to search, limit restricts the amount of data returned

The result should be a list of path structures. A path indicates the nodes and edges that are used to travel from the source to the destination.

Possible path query languages are:

  • JSONPath / JSONata (one could think of a fk being materialized as a child object and traversing it using a path)
  • AQL: https://www.arangodb.com/docs/stable/aql/graphs-traversals.html
  • PathQL: https://www.datasciencecentral.com/profiles/blogs/pathql-intelligently-finding-knowledge-as-a-path-through-a-maze
  • SPARQL Property Paths: https://www.w3.org/TR/sparql11-property-paths/
  • Recursive SQL: https://en.wikipedia.org/wiki/Hierarchical_and_recursive_queries_in_SQL
FOR v, e, p IN 1..2 any "person/94757"
  GRAPH 'test' RETURN p

aeberhart avatar Dec 06 '21 11:12 aeberhart

You don't mention the query interface, and how it would be related to pathQuery List<Map<String, Object>> query(QueryMeta info, Map<String, Object> arguments)

Would query return 'facts' (subject, property value), and pathQuery return paths?

peterjohnlawrence avatar Dec 06 '21 11:12 peterjohnlawrence

"query" returns tabular result data. If a column projects a (primary or foreign) key, there is a requirement that all keys come from the same table. So if the result of a query is:

[
  { id: 1, name: joe }
  { id: 2, name: mike }
]

then "queryMeta" tells the caller that id is the key of the employee table and thus the UI for instance shows a link to /northwind/EMP/1.

So even if in PathQL, the result is a table, we need a different structure that looks more like the SearchResult struct in order to represent targets from different types / tables.

aeberhart avatar Dec 06 '21 12:12 aeberhart

I think this makes a lot of sense. An analogy:

Schema-centric: Microsoft Outlook and Windows Filemanager depend heavily on folder structures. Collections, such as Inbox or a file folder, are based on the physical folders. Searching always starts within a folder, but searching across all folders is a special case.

Search-centric: GMail and Google Drive depend on tags/attributes of emails and notes. Collections, such as Inbox or a file folder are just predefined searches such as "in:inbox" or "in:myFolder"

So in Lens, and hopefully DashJoin, we can set up navigations using pathQuery and other path query languages. An FK navigation then, just like Google, becomes a special case of a search. With PathQL it is something like :myTable.getFacts("^rdf:type ")

End of rant:-)

peterjohnlawrence avatar Dec 07 '21 10:12 peterjohnlawrence

Final question is what the "Path" return type should look like. A path can be though of as a sequence of traverse operations. A traverse operation allows you to hop from one object to the next along a link property. Considering that the source node is known, a Path consists of:

  • database, table, key: location of the (intermediate) target node
  • fk: id of the foreign / link key to traverse (in the form of dj/db/table/column)

Note that in the first step, we do not consider searches that span multiple backends and thus omit the dj parameter identifying the target node.

If we consider the following data:

EMP: { id: 1, name: joe, worksOn: 1000 }
EMP: { id: 2, name: mike, worksOn: 1000 }
PRJ: { id: 1000, name: sales }

The call (assume the graph query * traverses all paths)

data.graphSearch(northwind, EMP, 1, *)

would return the conncetion from joe to mike via project sales:

// array of Path
[
  // Path 1
  [
    {
       db: northwind
       table: PRJ
       id: 1000
       fk: dj/northwind/EMP/worksOn
    }
    {
       db: northwind
       table: EMP
       id: 2
       fk: dj/northwind/EMP/worksOn
    }
  ]
]

aeberhart avatar Dec 07 '21 13:12 aeberhart

In the case of a relational database and RDF this edge definition is sufficient:

  • fk: id of the foreign / link key to traverse (in the form of dj/db/table/column)

In the case of IntelligentGraph paths, and probably nei4J and other property graphs, I think we would need to add further meta-data about the edge. For example, an IntelligentGraph edge might traverse a reified statement so we would want to know the reification object. In the case of neo4J we might want to retain the property values of the edge.

peterjohnlawrence avatar Dec 07 '21 15:12 peterjohnlawrence

Remaining TODOs:

  • support for property ranges
  • documentation on query syntax, expected return types (in javadoc), and the handling of graph queries in the catalog

aeberhart avatar Dec 21 '21 08:12 aeberhart