veramo icon indicating copy to clipboard operation
veramo copied to clipboard

[proposal] new Storage API

Open mirceanis opened this issue 3 years ago • 6 comments

Is your feature request related to a problem? Please describe. Veramo can currently use a single data source for credentials, presentations and messages; an implementation of the IDataStore and IDataStoreORM interfaces.

Most other veramo top-level plugins use a layered approach, where the top-level plugin acts as a coordinator between multiple lower level implementations of a common interface.

Also, the IDataStoreORM interface is based on a relational data model, with a lot of assumptions about the connections between the data items being stored. This makes new implementations difficult, so a new solution should be adopted.

Describe the solution you'd like A new storage model should support multiple data sources. Adopting such a pattern for storage, would bring it in line with the rest of the API as well as allowing users to store data in multiple locations (local private data, remote backup, remote inbox service, remote public information)

The solution chosen should be able to run queries on the data. Examples of common queries include:

  • all credentials issued by an issuer
  • issued after a certain date
  • containing a certain claim
  • matching a certain Type
  • all messages of a particular type
  • all presentations that include a certain @context
  • a VC/VP/message by ID
  • ...

Equally important is the ability to filter for credentials using JSONPath matches that are used in the DIF Presentation Exchange and Credential Manifest protocols.

Additional context

Some related projects already use an adapter pattern to support multiple sources:

mirceanis avatar Nov 07 '22 15:11 mirceanis

We're currently upgrading and updating VCManager for our Snap. This is what we've come up with so far.

Screenshot 2022-11-15 at 10 48 31

Idea is to keep structure similar to the one we've used before, and we think covers all the use cases mentioned above.

A couple of notes:

  • Filter can be selected (JSONPath, etc.). Most use cases should use JSONPath, but we still wanted to leave room for improvement
  • Filtering will be done in VCStorePlugin to keep VCManager as lightweight as possible
  • id used in delete is a sha256 hash of VC as id is not mandatory in every VC
  • save can save the same VC in one or more different stores (e.g. save(vc, ['database', 'google_drive'])
  • query will return all VCs if filter and store are not provided
  • Function clear can be added to remove all VCs from selected store

andyv09 avatar Nov 15 '22 10:11 andyv09

Thank you for the update, this is great!

I really like the fact that queries are propagated to each implementation because this would allow folks to run the query where the data lives instead of centralizing it locally.

I have a few questions:

  • is the filter type defining the type of query that follows? You mention "JSONPath", but would this be something like "couchdb", or "SQL" in other cases?
    • if so, did you also sketch any error scenarios for when a VCStorePlugin implementation cannot implement a type of query, or when the query fails for some exceptional reason?
  • How about storing something else than VCs? like presentations or messages, or anything else that looks like JSON and potentially has an id. It looks like the interface you defined is not limiting to VCs, which is great, IMO.

And suggestions:

  • The result of query() could be a list of objects containing data+metadata instead of using a returnStore parameter to select the return type. For example:
[
   {data: {<W3CVC>}, meta: {store: "local", id: "asdf"}},
   {data: {<W3CVP>}, meta: {store: "remote", id: "fghj"}
]
  • All methods of VCStorePlugin should support an options?: {} parameter for easy customization without interface changes. Something like this could be then used to forward authorization parameters for data stores that require them.

Please correct me if I misunderstood the API you described for filtering

mirceanis avatar Nov 15 '22 16:11 mirceanis

Hey, great feedback!

For the questions:

  1. Filter type defines the query that follows. Filter can be anything from 'SQL' to 'JSONPath'.
  2. Filters should be defined in StorePlugins. If you try to use a type of filter that is not supported in that specific StorePlugin it should throw an error.
  3. This model should definitely work in more generalized form.

With these questions and suggestions in mind we updated the diagram.

Screenshot 2022-11-16 at 13 07 43

Some notes:

  1. DataManager
  • save -> call save function from the selected store. If multiple stores are provided save data object in all of them
  • batchSave -> Go through array of objects and join the ones with same store. call batchSave function for every selected store and save an array of data objects.
  • query -> call query from the selected store. If multiple stores are selected, query through all of them and join results. Update meta for every result with store, if returnStore is set to true.
  • delete -> call delete function from the selected store
  • batchDelete -> same as batchSave, go through array and join the ones with same store. Call batchDelete for every selected store.
  • clear -> call clear for selected store.
  1. AbstractDataStore
  • save -> save one data object. Object type should be validated here (e.g. VCStorePlugin should throw an error if object is not of type W3CVerifiableCredential)
  • batchSave -> save multiple data objects. The type should be validated here.
  • query -> Should throw an error if filter.type is not supported (e.g. SnapVCStore will probably only support JSONPath). If filter is not provided return all data objects, else filter through them according to filter.type.
  • delete -> remove one object based on id.
  • batchDelete -> remove multiple data objects based on their id
  • clear -> delete all data objects. if filter is provided, only delete corresponding data objects.

Hope we havent missed anything.

andyv09 avatar Nov 16 '22 13:11 andyv09

We implemented the proposed DataManager into our Snap (ignore the outdated readme) and created the plugins to cover our needs. We'd love to hear your feedback.

andyv09 avatar Dec 08 '22 10:12 andyv09

I think it's shaping up really well. To really test it out we'd need to put it up against some real-world queries, and see where it starts to produce friction.

Do you plan to raise a PR to push your implementation to upstream?

mirceanis avatar Dec 09 '22 07:12 mirceanis

I'm a bit late to the party here, but it's great to see this conversation happening.

Filter type defines the query that follows. Filter can be anything from 'SQL' to 'JSONPath'. Filters should be defined in StorePlugins. If you try to use a type of filter that is not supported in that specific StorePlugin it should throw an error.

I'm strongly of the opinion that filter query should remain the same regardless of the underlying storage engine. Otherwise you can't simply switch out different storage engines without impacting the rest of the codebase. This requires Veramo to be opinionated on the type of query format.

One of the things I struggled with was managing the multiple different databases required for the whole library to work effectively. There's a lot of cross-referencing required through out the library. ie: most parts of the library require access to dids. As such, the Veramo singleton needs to easily expose access to those databases in some way.

From the PoC Verida DBManager implementation:

      dids: new VeridaDataStoreAdapter(await this.veridaContext.openDatabase('veramo_dids')),
      credentials: new VeridaDataStoreAdapter(await this.veridaContext.openDatabase('veramo_credentials')),
      presentations: new VeridaDataStoreAdapter(await this.veridaContext.openDatabase('veramo_presentations')),
      claims: new VeridaDataStoreAdapter(await this.veridaContext.openDatabase('veramo_claims')),
      messages: new VeridaDataStoreAdapter(await this.veridaContext.openDatabase('veramo_messages'))

Now, it makes sense that all these are needed, but I feel that specifying all these in a single DbManager class is a bit backwards.

My preference would be to have separate generic classes with their own storage configuration. For example, something like this:

veridaDbCredentials = {}
sqlDbCredentials = {}

veramoConfig = {
  dids: {
    // A single datastore
    datastore: new VeridaDatastore('did', veridaDbCredentials)
  },
  credentials: {
    // Multiple datastores
    datastore: [new VeridaDatastore('credentials', veridaDbCredentials), new SQLDatastore('credentials', sqlDbCredentials)]
  }
}

// Generic query interface regardless of the storage engine
VeramoQueryInterface {}

// Generic storage engine interface for all use cases
DatastoreInterface {
 async save()
 async query(query: VeramoQueryInterface)
 async delete()
 ...
}

// Two separate storage engine implementations
class VeridaDatastore implements DatastoreInterface {}
class SQLDatastore implements DatastoreInterface {}

The Veramo components don't need to expose access directly to the underlying storage engines, but provide appropriate interfaces to query, save, delete etc. (similar to what currently happens) and encapsulate any other logic required. I noticed how adjacent metadata was created in different databases for a single action, so it's important this continues where necessary. Exposing the storage engines directly would risk breaking this type of logic.

tahpot avatar Jan 03 '23 02:01 tahpot