antidote icon indicating copy to clipboard operation
antidote copied to clipboard

Mechanism for associating metadata attributes to objects

Open dvasilas opened this issue 6 years ago • 5 comments

Existing use cases for metadata attributes:

  • Security/Access control attributes.
  • User-defined attributes, which can include any CRDT type.

It makes sense to represent metadata attributes as CRDTs and store them in the datastore.

Design options:

Option 1

Store each metadata attribute as a separate object, associated with the data object through their keys: The key format can be object_key separator attribute_name where:

  • object_name: is the object's key
  • separator: is be a character such as / or .. Different separators can be used to distinguish between attributes that can be directly referenced by users, and system attributes, not directly accessible by users (ACLs).
  • attribute_name is the name of a metadata attribute.

The object itself will be stored under the key object_name.

As an example, the attribute Type of an object named my_counter will be accessed under the key my_counter/Type (assuming separator=/).

An additional attribute for each object can be maintained by the system, listing the names of all the attribute_names associated with the object (implemented as a set-CRDT).

Note: This design will restrict the keyspace visible to the user, as only part of the key will be used of the object name. Object names should not contain the character used as separator.

Option 2

Couple each object with its metadata attributes as a single object: Each object will be stored as a map-CRDT under the key object_name, containing both metadata attributes under map keys corresponding to the attribute_names and the data object under a special map key.

Option 3

For each object there exists an additional metadata object containing all its metadata attributes: Each object will be stored under the key object_name. Its metadata attributes will be stored as a map-CRDT under a key associated with the object_name, such as object_name/md or _object_name.


Any of these designs can be implemented at the protocol buffer interface level. The interface would be extended with:

  • Functions for reading and updating used-defined metadata attributes that will manipulate the key format and use regular read_objects and update_objects calls to read and write objects to Antidote.
  • Special functions for reading and updating system metadata attributes (ex. access control) implementing additional mechanisms for enforcing required security invariants.

Note: In order to ensure that objects and their metadata attributes are mapped in the same server, the sharding mechanism can be modified to calculate shards based on a prefix of the key, omitting suffixes used for storing metadata attributes. In that way, the objects my_counter and my_counter/Type will be mapped in the same server.

I propose implementing Option 3 and I can work on it.

dvasilas avatar Sep 01 '17 15:09 dvasilas

Le 1 sept. 2017 à 17h49, dimitriosvasilas [email protected] a écrit :

[…]

As an example, the attribute Type of an object Key1 would be accessed under the key Key1/Type, while the object itself would be accessed under the key Key1.

Actually I suggest the object itself be accessed under the key Key1:concrete_type which ensures the reader/writer knows the actual type of the object. (and so Key1/Type contains the value concrete_type).

Make sure the store API does not allow to access the sub-keys directly, i.e. the API only accepts Key1 and adds the sub-keys itself.

						Marc

marc-shapiro avatar Sep 01 '17 17:09 marc-shapiro

@marc-shapiro it appears that this functionality is already in place. Consider the following testcase performing read and write operations:

    ...
    Key1=clocksi_test6_key1,
    BoundObj1 = {Key1, antidote_crdt_counter, ?BUCKET},
    BoundObj2 = {Key1, antidote_crdt_mvreg, ?BUCKET},
    
    {ok, TxId} = rpc:call(FirstNode, cure, start_transaction, [ignore, []]),

    ok = rpc:call(FirstNode, cure, update_objects, [[{BoundObj1, increment, 1}], TxId]),
    {ok, _Res} = rpc:call(FirstNode, cure, read_objects, [[BoundObj1], TxId]),

    ok = rpc:call(FirstNode, cure, update_objects, [[{BoundObj2, assign, <<"a">>}], TxId]),
    {ok, _Res} = rpc:call(FirstNode, cure, read_objects, [[BoundObj2], TxId]),
     
    End = rpc:call(FirstNode, cure, commit_transaction, [TxId]),
    ...

The type of an object needs to be specified for reads and writes.

In fact, the execution results in an error when I update an object using one type and then try accessing the same key with a different type.

=== Reason: no match of right hand side value 
                 {badrpc,
                  {'EXIT',
                   {{function_clause,
                     [{antidote_crdt_mvreg,'-downstream/2-lc$^0/1-1-',
...

However, encoding the type information in the object key indeed allows to easily check whether the operation uses the correct object type.

dvasilas avatar Sep 07 '17 15:09 dvasilas

It seems to me, that the current proposal could be implemented in client-applications without changing Antidote itself. Are there plans to later use these attributes for things like search / secondary indexes or other use cases which require to implement this directly in Antidote?

peterzeller avatar Sep 07 '17 16:09 peterzeller

In fact, this issue came up because i intend to implement secondary indexes on these attributes, and at the same time other use cases are using security/access control attributes.

It would maybe make sense for these works to use a unified interface provided by Antidote for managing these attributes, rather than re-implement similar mechanisms in different ways.

dvasilas avatar Sep 07 '17 17:09 dvasilas

Both Lasp and Riak (2i, Yokozuna, Search) have extensions for storing per-object metadata, if you're interested in looking at how those operate.

cmeiklejohn avatar Sep 12 '17 12:09 cmeiklejohn