aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Sanitizing ArrayData for REST-API

Open JPchico opened this issue 1 year ago • 7 comments

Sanitizing np.array from Nan and np.inf to ensure that the arrays are proper JSON and can be obtained via the REST-API. Addresses #5559

JPchico avatar Aug 06 '22 09:08 JPchico

@sphuber, @chrisjsewell Do you agree with this PR? Or should something else be done before merging?

JPchico avatar Aug 10 '22 16:08 JPchico

I'm happy with the concept of changing inf and nan to None. We should probably just add a test and maybe update the documentation of the REST API. Maybe add a note in this section. I also haven't checked in depth whether the numpy array manipulation is correct or the most efficient solution, but if there is a test for it, I am happy to accept it.

sphuber avatar Aug 10 '22 17:08 sphuber

Hi @sphuber!

I'm happy with the concept of changing inf and nan to None. We should probably just add a test and maybe update the documentation of the REST API.

Sure thing! I'll add a test, I guess that what we are interested in is setting a test in the route to verify that the resulting JSON is valid. Since otherwise the valid entries of the array are the same. I was looking at the test_route and I think that one should then add an array like what is done for the structure https://github.com/aiidateam/aiida-core/blob/6ab9f51546174fe9d9cf065df013302c5f717fb5/tests/restapi/test_routes.py#L35-L51

After that one would just request for that node and check what is obtained from it like in this case

https://github.com/aiidateam/aiida-core/blob/6ab9f51546174fe9d9cf065df013302c5f717fb5/tests/restapi/test_routes.py#L1004-L1015

Maybe add a note in this section.

Sure thing I'll add a note in the documentation indicating that if an ArrayData contains np.nan or np.inf they will be served as null to make sure that they are valid JSON.

I also haven't checked in depth whether the numpy array manipulation is correct or the most efficient solution, but if there is a test for it, I am happy to accept it.

Yes, I'm also unsure of what the most efficient implementation would be, but this is the best I could find. there is a function in numpy to replace np.nan as numbers nan_to_num. However, that only allows one to replace the np.nan by numbers, which of course means that one would need to arbitrarily choose a number to replace the np.nan and then transform it to None which can be a bit dangerous. Trying to use a call like array[np.isnan(array)]= None will fail as None cannot be set in a numeric array, if we typecast the array to object then np.isnan does not work. Pandas has a function that allows one to do this without problems, however, then one would need to include pandas as a dependency which seems overkill. If there is a more efficient way of doing it I'll happily do it, just have not really found a better way without adding extra dependencies.

JPchico avatar Aug 11 '22 06:08 JPchico

@sphuber done, test and note in docs added. If something else is needed just let me know.

JPchico avatar Aug 11 '22 16:08 JPchico

If the docsstring are giving you issues, it might be best to remove the typing from the docstring and use type hints directly in the function signature. Didn't request this in the review since it is a minor thing, but since you are touching it now anyway, might as well use type hints

sphuber avatar Aug 17 '22 11:08 sphuber

Hi @sphuber , funnily enough it seems to be that there were some issues with unavailable resources (in other areas of the code), for example one of the errors that I'm getting is the following

(topics/data_types: line  752) -ignored- https://onlinelibrary.wiley.com/doi/10.1002/qua.560300306: 503 Server Error: Service Temporarily Unavailable for url: https://onlinelibrary.wiley.com/doi/10.1002/qua.560300306

I solved another in which the link for the requests documentation seemed to be pointing to an unsafe ulr that could not be accessed. This one though is a bit strange since I can enter the url but the compilation ends in error.

JPchico avatar Aug 17 '22 11:08 JPchico

So removing the numpy.ndarray type hinting and fixing the broken links worked. I will squash the last commits into a single one to have a cleaner history.

JPchico avatar Aug 17 '22 11:08 JPchico