jane icon indicating copy to clipboard operation
jane copied to clipboard

Sanitize queries on JSONField in database

Open megies opened this issue 9 years ago • 11 comments

As I understand, getting filtered sets of data (e.g. events in quakeml) uses django's QuerySet.extra(where=[...]) with manually built up query strings. This is pretty hard to extend and there's quite a lot of code involved to build those query strings. Django docs mention that..

Use this method as a last resort

This is an old API that we aim to deprecate at some point in the future. [...]

and..

Warning

You should be very careful whenever you use extra(). Every time you use it, you should escape any parameters that the user can control by using params in order to protect against SQL injection attacks.

Maybe I'm missing something but couldn't this part be simplified by using the API on querying JSONFields?

Also.. django 1.9 now ships JSONField, maybe this should be used instead of the additional jsonfield dependency?

megies avatar Apr 30 '16 09:04 megies

keep in mind that the project started while django 1.6 was the most recent version - at this time the json datatype within postgresql was bleeding edge technology - therefore the usage of extra with costum SQL - so no reason to quote the Django documentation as we were aware of its implications ;)

anyway, I agree that we have to switch to the internal JSONField of Django, but this requires to make Jane compatible with Django 1.9 - currently its based on 1.8

barsch avatar May 03 '16 07:05 barsch

keep in mind that the project started while django 1.6

can't keep in mind what I don't know, all i see is that it's based on 1.8 as it stands now ;-)

so no reason to quote the Django documentation as we were aware of its implications

didn't intend to be offensive, just summarizing the current state

but this requires to make Jane compatible with Django 1.9

Totally clear, I tried to give it a shot but had to give up after working around a couple of issues. Just way too little knowledge about django and the specific setup..

For the time being, I could work around current limitations in the filtering introducing a very ugly negation of the whole where SQL part (see #21).

megies avatar May 03 '16 08:05 megies

I can take care of this if we want to do this - it will greatly simplify the whole thing.

The only thing to keep in mind that it currently works with django 1.8 which is LTS. The json field was only introduced in django 1.9 which has a fairly short life cycle so somebody will need to take care to update it to newer django versions.

krischer avatar May 03 '16 08:05 krischer

the JSONField alone is a reason to go for Django 1.9 - also I'm not a big fan of Django LTS versions as upgrading them later is a huge pain

barsch avatar May 03 '16 08:05 barsch

I've started work on this.. currently this is partially hampered by django/django#6929, the regex filtering doesn't work due to that bug. The patch is super simple, maybe we can work around it somehow..

megies avatar Sep 28 '17 15:09 megies

or someone figures out to either update django-plugins or get rid of it and use some other plugin system and we can just use the latest django.

krischer avatar Sep 28 '17 16:09 krischer

or you could just monkey patch the one function I guess

krischer avatar Sep 28 '17 16:09 krischer

This should be pretty close to being ready now.. but I'm stuck with some problem in the station index querying.. no clue why this is not working as expected:

(Pdb) for i in query: print(i.json['station'])
ALTM
ALTM
ALTM
(Pdb) query.filter(json__station__exact='ALTM')
[]
(Pdb) print(type(query))
<class 'django.contrib.gis.db.models.query.GeoQuerySet'>

megies avatar Sep 29 '17 10:09 megies

Tests pass, only some flake8 fails that I don't get locally. I think this is ready for merging after a review by @krischer or @barsch.


This removes internal usage of where low-level queries (that are deprecated in newer Django) with builtin queries on the new Django builtin jsonfield.

All existing tests pass. Only one test was changed, but I think that it should be changed (also returning null elements when a specific string is excluded, see https://github.com/krischer/jane/pull/20/files#diff-90539ca4a8392804c680442de74b8ee3R396). So there's a slight change in results for exclusion type queries but I think this is closer to the expected result.

The only ugly thing about this PR is that we need to patch Django because of a bug in django <1.11, see https://github.com/krischer/jane/pull/20/commits/894824c4154e5045dacdd3cd3564d87afa69ff86. But the patching seems to work alright, so I think it might be the lesser evil.

Also, this PR should enable having nested information in the json field (i.e. dictionaries), which is currently not supported. (CC @bch0w)

megies avatar Oct 27 '17 11:10 megies

Ok, so I finally found out what's going on with the comma-separated-lists station query.. I thought it never was implemented, but turns out it was, but it never was tested, so I accidentally removed it in this PR. So don't merge yet..

Also I think event queries are completely untested??

megies avatar Oct 27 '17 20:10 megies

Ok, so I finally found out what's going on with the comma-separated-lists station query.. I thought it never was implemented, but turns out it was, but it never was tested, so I accidentally removed it in this PR. So don't merge yet..

Also I think event queries are completely untested??

megies avatar Oct 27 '17 20:10 megies