aiida-core
aiida-core copied to clipboard
Standardised serialisation/deserialisation of data nodes
For use cases such as the web API and declarative workflows, it would be desirable to have a standardised way to construct/deconstruct data nodes, in a round-trippable and language agnostic manner. In addition, it would also be helpful for the node to declare a schema for the serialisation, e.g. for client-side validation (this is similar to how https://graphql.org/ works).
For a simple data type, this might be something like:
from aiida.orm import Data, User
import jsonschema
class Int(Data):
@staticmethod
@property
def schema():
return {
'type': 'object',
'properties': {
'value': {'type': 'integer'},
'user_email': {'type': 'string'},
}
}
@classmethod
def deserialize(cls, data: dict):
jsonschema.validate(data, cls.schema)
return cls(data['value'], user=User.objects.get(email=data['user_email']))
def serialize(self):
return {'value': self.value, 'user_email': self.user.email}
Note the schema does not necassarily have to have a one-to-one correspondence with how the data is actually stored, e.g. in the attributes field
For data types that store (large) binary data, i.e. in the repository, this is a bit more tricky, since (a) that is not strictly JSONable, and (b) we would want to stream this data, rather than read it into memory. This may require some kind of extension to standard JSON schema.
cc @louisponet
I think it would also be useful to have the schemas themselves in a json somewhere, so the other languages can use them too. Maybe there can then be a generic .schema function that simply loads the corresponding file?
Of course if everything goes through a restapi, that can also simply be requested through a url
I think it would also be useful to have the schemas themselves in a json somewhere
Possibly, but this would be difficult to enforce for plugins, plus having to keep these in a standard place.
(as you edited 😉 )
the idea with Node.schema is that you would "serve" them via a web API, with the backend running aiida in Python, then the frontend running whatever language
For reference: we should consider to use MessagePack as a more efficient serialization format (compared to JSON); also supports binary data.