aiida-core
aiida-core copied to clipboard
Sanitizing ArrayData for REST-API
Sanitizing np.array from Nan and np.inf to ensure that the arrays are proper JSON and can be obtained via the REST-API. Addresses #5559
@sphuber, @chrisjsewell Do you agree with this PR? Or should something else be done before merging?
I'm happy with the concept of changing inf
and nan
to None
. We should probably just add a test and maybe update the documentation of the REST API. Maybe add a note in this section. I also haven't checked in depth whether the numpy
array manipulation is correct or the most efficient solution, but if there is a test for it, I am happy to accept it.
Hi @sphuber!
I'm happy with the concept of changing
inf
andnan
toNone
. We should probably just add a test and maybe update the documentation of the REST API.
Sure thing! I'll add a test, I guess that what we are interested in is setting a test in the route to verify that the resulting JSON is valid. Since otherwise the valid entries of the array are the same.
I was looking at the test_route
and I think that one should then add an array like what is done for the structure
https://github.com/aiidateam/aiida-core/blob/6ab9f51546174fe9d9cf065df013302c5f717fb5/tests/restapi/test_routes.py#L35-L51
After that one would just request for that node and check what is obtained from it like in this case
https://github.com/aiidateam/aiida-core/blob/6ab9f51546174fe9d9cf065df013302c5f717fb5/tests/restapi/test_routes.py#L1004-L1015
Maybe add a note in this section.
Sure thing I'll add a note in the documentation indicating that if an ArrayData
contains np.nan
or np.inf
they will be served as null
to make sure that they are valid JSON.
I also haven't checked in depth whether the
numpy
array manipulation is correct or the most efficient solution, but if there is a test for it, I am happy to accept it.
Yes, I'm also unsure of what the most efficient implementation would be, but this is the best I could find. there is a function in numpy to replace np.nan
as numbers nan_to_num. However, that only allows one to replace the np.nan
by numbers, which of course means that one would need to arbitrarily choose a number to replace the np.nan
and then transform it to None
which can be a bit dangerous.
Trying to use a call like array[np.isnan(array)]= None
will fail as None
cannot be set in a numeric array, if we typecast the array to object
then np.isnan
does not work.
Pandas has a function that allows one to do this without problems, however, then one would need to include pandas as a dependency which seems overkill.
If there is a more efficient way of doing it I'll happily do it, just have not really found a better way without adding extra dependencies.
@sphuber done, test and note in docs added. If something else is needed just let me know.
If the docsstring are giving you issues, it might be best to remove the typing from the docstring and use type hints directly in the function signature. Didn't request this in the review since it is a minor thing, but since you are touching it now anyway, might as well use type hints
Hi @sphuber , funnily enough it seems to be that there were some issues with unavailable resources (in other areas of the code), for example one of the errors that I'm getting is the following
(topics/data_types: line 752) -ignored- https://onlinelibrary.wiley.com/doi/10.1002/qua.560300306: 503 Server Error: Service Temporarily Unavailable for url: https://onlinelibrary.wiley.com/doi/10.1002/qua.560300306
I solved another in which the link for the requests documentation seemed to be pointing to an unsafe ulr that could not be accessed. This one though is a bit strange since I can enter the url but the compilation ends in error.
So removing the numpy.ndarray
type hinting and fixing the broken links worked. I will squash the last commits into a single one to have a cleaner history.