atomic-server icon indicating copy to clipboard operation
atomic-server copied to clipboard

Multiple filter queries (activity feed usecase)

Open joepio opened this issue 2 years ago • 3 comments

I'd like to show an Activity Feed on Agent's profiles. This means we need Collections that filter by both Agent as well as Commit. Currently, queries only allow for one filter per property and value. This is not enough.

So I think we'll need to make some refactors to our query model.

I think Query object should have a Vec[(Property, Value)] filter.

But how should we change out query_index::QueryFilter? I think we have two options:

In one Query, potentially use multiple QueryFilters

Let's say you want to get all Commits, older than 3 days, signed by agent X. That's three filters. We could first check out the commits filter, then the older than 3 days filter, then the signed by agent X filter, iterate over the subjects, and merge the ones that have all of these properties.

For Commits, this may become a problem, as the list of Commits is really big. We can't iterate over all of these and expect decent performance in large filter collections.

Make QueryFilter store Vec[(Property, Value)]

This means we create a new index for each multi-filter query. We solve the performance issue at query time. However, we get far more indexes. This means that for every commit (or: for every Atom inside these commits), we'll have far more QueryFilters to check. At this time, I think these are pretty fast, but I'm not sure how this will scale.

joepio avatar May 24 '22 14:05 joepio

As another option:

Decouple indices from queries

Allow user to create indices independent of queries. (Perhaps via a new Index class.) Then we need a “query planner” to select appropriate indices for a query.

This option is obviously harder to implement but allow for more flexibility, and the user can decide between query performance or write amplification.

rasendubi avatar Jun 03 '22 01:06 rasendubi

@rasendubi thanks for the suggestion. Not sure if I fully understand it, though. How could the Query Planner API look? Should the Query or QuertFilter API change?

One possibly related thing I was considering, is to add an index option to Query which toggles persisting the index. This allows us to slow down disk space usage, as current behavior creates indexes for every query.

joepio avatar Jun 03 '22 06:06 joepio

My proposal is very similar to a traditional database design: you can define indices (similar to SQL's CREATE INDEX) and then you can use any query. The performance of a query depends on whether you have an index or not.

A sample index:

{
  "@id": "https://localhost/indices/agentCommits",
  "https://atomicdata.dev/properties/isA": "https://atomicdata.dev/classes/Index",
  "https://atomicdata.dev/properties/index/filters": [
    // Only include commits in the index
    {
      "https://atomicdata.dev/properties/isA": ["https://atomicdata.dev/classes/IndexFilter"],
      "https://atomicdata.dev/properties/property": "https://atomicdata.dev/properties/isA",
      "https://atomicdata.dev/properties/value": "https://atomicdata.dev/classes/Commit"
    }
  ],
  "https://atomicdata.dev/properties/index/include": [
    // Compound index sorted by [signer, createdAt]. (Can probably add a direction here?)
    "https://atomicdata.dev/properties/signer",
    "https://atomicdata.dev/properties/createdAt"
  ]
}

(This index is hard to derive from query automatically as it requires the knowledge that isA: Commit filter never changes, and signer changes frequently. [signer, createdAt] should also be in that order, because [createdAt, signer] doesn't make much sense.)

Given a query, the query planner is responsible for selecting indices and determine how they should be traversed. The API is roughly plan(query): Plan where Plan encapsulates how to execute the query (e.g., in SQLite, plan is a program in sqlite bytecode).


This proposal is somewhat orthogonal to allowing multiple filters on a query. Because a filter does not imply an index with this approach, we can either allow multiple filters on query, or make a filter store Vec<(Property, Value)>—it shouldn't make much difference.

rasendubi avatar Jun 03 '22 16:06 rasendubi