aiida-core
aiida-core copied to clipboard
Alternative implementation for complex number support
Also see #5561
In this PR the serialization isn't done on the low level for each backend but on the NodeAttributes
/NodeExtras
class. Added a function deserialize_value
, which is the counterpart of clean_value
, which is inserted into the attribute retrieval when the node is stored
@sphuber Support for this could also be added on the BackendNode
and the corresponding extras mixin but I didn't see a clean way to change this without renaming/changing the abstractmethods needed when implementing a backend. In this case you could add something like set/get_raw_attribute
as the abstractmethods (values are always assumed json-serializable when the node is stored) and set/get_attribute
would add the serialization.
atm the BackendNode
of psql_dos
also still calls clean_value
, since this behaviour is explicitly tested on the BackendNode directly in many places.
Problems/TODOs:
- [ ] Support for complex attributes as filters in the QueryBuilder is really messy, since it doesn't have a nice layer to insert serialization of the filters in a backend agnostic way. Maybe something like #5088 would be useful for this
- [ ] Support for complex numbers in Jsonable. It would work in principle but the initial serialization
json.loads(json.dumps(dictionary))
would need to be modified. Simply wrappingdictionary
withclean_value
does not work, since the class wants to support NaN/inf by converting them into strings
Thanks @janssenhenning , apologies for the radio silence on this. Now that we have the hackathon, I think it would be good to have a final discussion about this (maybe in person tomorrow would be good, with those that are interested) and then make a decision on how to best implement this.
I went over the issue and your PR again, and I will write a few thoughts down here that I had that I would like to discuss.
Current situation
- There only is "serialization" of data that should be stored on a
Node
as attributes and extras, which is provided by theaiida.orm.implementation.utils.clean_value
function. The concept is that theclean_value
returns input data such that if it makes a round-trip to being stored as JSON, when it is retrieved, it would be unaltered. - The
clean_value
is currently only called by thePsqlDosBackend
storage backend implementation. It calls it only when attributes and extras of stored nodes are mutated and just before a new node is stored. This is done for efficiency purposes, such that attributes and extras are not cleaned multiple times before being stored indefinitely.
The concept
- Until recently, the data storage in AiiDA was hardcoded, but recently it has become pluginnable and in principle, plugin packages can implement their own data storage backend. I say in principle, because it is not sure whether the abstract interface
aiida.orm.implementation.storage_backend.StorageBackend
is sufficiently abstracted to completely replace all functionality. Most probable candidate for things to break is theQueryBuilder
. That being said, there are already storage backend plugins that simply use thePsqlDosBackend
for the database implementation but then extend/replace the file repository. - The main promise of AiiDA's data storage is that it consists of two categories of data:
- JSON-serializable data that can be queried
- Binary blobs
- It seems logical that the connection between the front-end ORM and the backend data storage should then only ever communicate in these data types. A storage backend implementation that implements the abstract interface then knows that it only ever has to deal with binary blobs or JSON data.
- If this could be a hard-contract guaranteed by
aiida-core
it would make it easier to implement consistent storage backends.
What is missing
- Given the above, I think it would be good to move the responsibility of serializing/deserializing JSON data from the storage backend implementation to the abstract backend interface.
- The advantage is that the behavior would be consistent across backend implemenations
- The disadvantage is that it is probably a bit more restrictive and might limit the implementation to implement the most efficient method (see for example how
PsqlDosBackend
only serializes just before storing a node, or when mutating stored nodes) - @janssenhenning already touched on this on how to potentially do this. We could chose to implement the serialization/deserialization directly on the getters and setters of the
BackendNode
andBackendEntityExtrasMixin
. However, this would make the abstract interface more concrete and require implementations to callsuper
, with the risk of forgetting this and the serialization not being called. - Alternative is to keep the getters/setters abstract, and provide a concrete serialization and deserialization methods on the abstract interface that implementation can call. This has the same problem though where an implementation can fail to call the serializer/deserializer. Still I think having the serialization/deserialization in one place in
aiida-core
would be a good thing.
All in all, I think changing this might make sense, and if we do it, we should do it soon. Currently there are no storage backend plugins out there that I know off (other than my own), so we can now still easily change this, but that might change.
@giovannipizzi @chrisjsewell do you have time to discuss this tomorrow in person?
@sphuber Thanks for the detailed answer. Would be happy to discuss today how we could get this into aiida-core 👍