aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

✨ NEW: Add querybuilder jsonschema

Open chrisjsewell opened this issue 2 years ago • 4 comments

This is a draft PR for adding a validation schema for input to the QueryBuilder.

Currently the schema is based on the output of QueryBuilder.as_dict, which is more restrictive than the actual input to QueryBuilder. This is because when init'ing or adding variables to the query, the QueryBuilder coerces certain input types to the final format. If we use this e.g. for validation in the REST API, do we want to allow all possible formats?

Note also, this PR is currently based on top of #5081

chrisjsewell avatar Aug 16 '21 16:08 chrisjsewell

Codecov Report

Merging #5083 (7de53c3) into develop (74571a1) will increase coverage by 0.03%. The diff coverage is 83.19%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #5083      +/-   ##
===========================================
+ Coverage    80.44%   80.47%   +0.03%     
===========================================
  Files          531      531              
  Lines        36973    37032      +59     
===========================================
+ Hits         29738    29796      +58     
- Misses        7235     7236       +1     
Flag Coverage Δ
django 74.96% <80.54%> (+0.04%) :arrow_up:
sqlalchemy 73.90% <77.88%> (+0.04%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
aiida/restapi/resources.py 97.26% <ø> (ø)
aiida/tools/importexport/dbexport/main.py 97.31% <ø> (ø)
aiida/orm/implementation/django/querybuilder.py 78.75% <75.00%> (+0.91%) :arrow_up:
aiida/orm/querybuilder.py 82.60% <81.71%> (+0.86%) :arrow_up:
...iida/orm/implementation/sqlalchemy/querybuilder.py 82.03% <87.50%> (+0.99%) :arrow_up:
aiida/tools/graph/age_rules.py 89.19% <93.34%> (ø)
aiida/transports/plugins/local.py 81.41% <0.00%> (-0.25%) :arrow_down:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 74571a1...7de53c3. Read the comment docs.

codecov[bot] avatar Aug 16 '21 16:08 codecov[bot]

If we use this e.g. for validation in the REST API, do we want to allow all possible formats?

Maybe not publicly? However, it also seems unintuitive that a passed dictionary that normally works with QueryBuilder.from_dict() doesn't work with the REST API.

Thinking more about it, I'd consider making the /querybuilder endpoint POSTed JSON data equal what can be done with the QueryBuilder.from_dict() method. It's more intuitive and it makes it easy to handle internally as well. The documentation should be able to be based on the current documentation for what may be passed to the from_dict() method and then that's that? Unless it poses a security vulnerability, of course. I'm guessing the output of to_dict() represents a subset of what can be passed to from_dict()? In which case this also works out.

CasperWA avatar Aug 17 '21 08:08 CasperWA

I'm guessing the output of to_dict() represents a subset of what can be passed to from_dict()? In which case this also works out.

yep

It's more intuitive and it makes it easy to handle internally as well.

I'd kinda disagree with this: the pro is that you can have smaller / less verbose JSONs, e.g.

{
   "project": {
      "tag1": ["property"]
   } 
}

rather than:

{
   "project": {
      "tag1": [{"property1": {}}]
   } 
}

But the con is that it goes against the Python zen; "There should be one-- and preferably only one --obvious way to do it". Users now have to understand multiple possible formats, and the JSON schema has to be more complex to, incorporate the multiple formats, which will probably mean the validation errors it returns will be less "specific" and also validation will be slower (probably not noticeably though)

chrisjsewell avatar Aug 17 '21 09:08 chrisjsewell

It's more intuitive and it makes it easy to handle internally as well.

Actually, another reason that come to mind for wanting to only allow the "final" format in the schema, is that then you could potentially just bypass all the QueryBuilder code that does the validation and type coercion, and set the instance attributes directly. This would intrinsically speed up REST API calls (not sure by how much though)

chrisjsewell avatar Aug 19 '21 04:08 chrisjsewell