aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Alternative implementation for complex number support

Open janssenhenning opened this issue 1 year ago • 2 comments

Also see #5561

In this PR the serialization isn't done on the low level for each backend but on the NodeAttributes/NodeExtras class. Added a function deserialize_value, which is the counterpart of clean_value, which is inserted into the attribute retrieval when the node is stored

@sphuber Support for this could also be added on the BackendNode and the corresponding extras mixin but I didn't see a clean way to change this without renaming/changing the abstractmethods needed when implementing a backend. In this case you could add something like set/get_raw_attribute as the abstractmethods (values are always assumed json-serializable when the node is stored) and set/get_attribute would add the serialization. atm the BackendNode of psql_dos also still calls clean_value, since this behaviour is explicitly tested on the BackendNode directly in many places.

Problems/TODOs:

  • [ ] Support for complex attributes as filters in the QueryBuilder is really messy, since it doesn't have a nice layer to insert serialization of the filters in a backend agnostic way. Maybe something like #5088 would be useful for this
  • [ ] Support for complex numbers in Jsonable. It would work in principle but the initial serialization json.loads(json.dumps(dictionary)) would need to be modified. Simply wrapping dictionary with clean_value does not work, since the class wants to support NaN/inf by converting them into strings

janssenhenning avatar Aug 06 '22 11:08 janssenhenning

Thanks @janssenhenning , apologies for the radio silence on this. Now that we have the hackathon, I think it would be good to have a final discussion about this (maybe in person tomorrow would be good, with those that are interested) and then make a decision on how to best implement this.

I went over the issue and your PR again, and I will write a few thoughts down here that I had that I would like to discuss.

Current situation

  • There only is "serialization" of data that should be stored on a Node as attributes and extras, which is provided by the aiida.orm.implementation.utils.clean_value function. The concept is that the clean_value returns input data such that if it makes a round-trip to being stored as JSON, when it is retrieved, it would be unaltered.
  • The clean_value is currently only called by the PsqlDosBackend storage backend implementation. It calls it only when attributes and extras of stored nodes are mutated and just before a new node is stored. This is done for efficiency purposes, such that attributes and extras are not cleaned multiple times before being stored indefinitely.

The concept

  • Until recently, the data storage in AiiDA was hardcoded, but recently it has become pluginnable and in principle, plugin packages can implement their own data storage backend. I say in principle, because it is not sure whether the abstract interface aiida.orm.implementation.storage_backend.StorageBackend is sufficiently abstracted to completely replace all functionality. Most probable candidate for things to break is the QueryBuilder. That being said, there are already storage backend plugins that simply use the PsqlDosBackend for the database implementation but then extend/replace the file repository.
  • The main promise of AiiDA's data storage is that it consists of two categories of data:
    • JSON-serializable data that can be queried
    • Binary blobs
  • It seems logical that the connection between the front-end ORM and the backend data storage should then only ever communicate in these data types. A storage backend implementation that implements the abstract interface then knows that it only ever has to deal with binary blobs or JSON data.
  • If this could be a hard-contract guaranteed by aiida-core it would make it easier to implement consistent storage backends.

What is missing

  • Given the above, I think it would be good to move the responsibility of serializing/deserializing JSON data from the storage backend implementation to the abstract backend interface.
  • The advantage is that the behavior would be consistent across backend implemenations
  • The disadvantage is that it is probably a bit more restrictive and might limit the implementation to implement the most efficient method (see for example how PsqlDosBackend only serializes just before storing a node, or when mutating stored nodes)
  • @janssenhenning already touched on this on how to potentially do this. We could chose to implement the serialization/deserialization directly on the getters and setters of the BackendNode and BackendEntityExtrasMixin. However, this would make the abstract interface more concrete and require implementations to call super, with the risk of forgetting this and the serialization not being called.
  • Alternative is to keep the getters/setters abstract, and provide a concrete serialization and deserialization methods on the abstract interface that implementation can call. This has the same problem though where an implementation can fail to call the serializer/deserializer. Still I think having the serialization/deserialization in one place in aiida-core would be a good thing.

All in all, I think changing this might make sense, and if we do it, we should do it soon. Currently there are no storage backend plugins out there that I know off (other than my own), so we can now still easily change this, but that might change.

@giovannipizzi @chrisjsewell do you have time to discuss this tomorrow in person?

sphuber avatar Nov 21 '22 22:11 sphuber

@sphuber Thanks for the detailed answer. Would be happy to discuss today how we could get this into aiida-core 👍

janssenhenning avatar Nov 22 '22 08:11 janssenhenning