aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Standardised serialisation/deserialisation of data nodes

Open chrisjsewell opened this issue 3 years ago • 4 comments
trafficstars

For use cases such as the web API and declarative workflows, it would be desirable to have a standardised way to construct/deconstruct data nodes, in a round-trippable and language agnostic manner. In addition, it would also be helpful for the node to declare a schema for the serialisation, e.g. for client-side validation (this is similar to how https://graphql.org/ works).

For a simple data type, this might be something like:

from aiida.orm import Data, User
import jsonschema


class Int(Data):

    @staticmethod
    @property
    def schema():
        return {
            'type': 'object',
            'properties': {
                'value': {'type': 'integer'},
                'user_email': {'type': 'string'},
            }
        }

    @classmethod
    def deserialize(cls, data: dict):
        jsonschema.validate(data, cls.schema)
        return cls(data['value'], user=User.objects.get(email=data['user_email']))

    def serialize(self):
        return {'value': self.value, 'user_email': self.user.email}

Note the schema does not necassarily have to have a one-to-one correspondence with how the data is actually stored, e.g. in the attributes field

For data types that store (large) binary data, i.e. in the repository, this is a bit more tricky, since (a) that is not strictly JSONable, and (b) we would want to stream this data, rather than read it into memory. This may require some kind of extension to standard JSON schema.

chrisjsewell avatar Mar 23 '22 14:03 chrisjsewell

cc @louisponet

chrisjsewell avatar Mar 24 '22 15:03 chrisjsewell

I think it would also be useful to have the schemas themselves in a json somewhere, so the other languages can use them too. Maybe there can then be a generic .schema function that simply loads the corresponding file?

Of course if everything goes through a restapi, that can also simply be requested through a url

louisponet avatar Mar 24 '22 16:03 louisponet

I think it would also be useful to have the schemas themselves in a json somewhere

Possibly, but this would be difficult to enforce for plugins, plus having to keep these in a standard place.

(as you edited 😉 ) the idea with Node.schema is that you would "serve" them via a web API, with the backend running aiida in Python, then the frontend running whatever language

chrisjsewell avatar Mar 24 '22 16:03 chrisjsewell

For reference: we should consider to use MessagePack as a more efficient serialization format (compared to JSON); also supports binary data.

csadorf avatar Jul 11 '22 15:07 csadorf