Sanitize queries on JSONField in database
As I understand, getting filtered sets of data (e.g. events in quakeml) uses django's QuerySet.extra(where=[...]) with manually built up query strings. This is pretty hard to extend and there's quite a lot of code involved to build those query strings.
Django docs mention that..
Use this method as a last resort
This is an old API that we aim to deprecate at some point in the future. [...]
and..
Warning
You should be very careful whenever you use extra(). Every time you use it, you should escape any parameters that the user can control by using params in order to protect against SQL injection attacks.
Maybe I'm missing something but couldn't this part be simplified by using the API on querying JSONFields?
Also.. django 1.9 now ships JSONField, maybe this should be used instead of the additional jsonfield dependency?
keep in mind that the project started while django 1.6 was the most recent version - at this time the json datatype within postgresql was bleeding edge technology - therefore the usage of extra with costum SQL - so no reason to quote the Django documentation as we were aware of its implications ;)
anyway, I agree that we have to switch to the internal JSONField of Django, but this requires to make Jane compatible with Django 1.9 - currently its based on 1.8
keep in mind that the project started while django 1.6
can't keep in mind what I don't know, all i see is that it's based on 1.8 as it stands now ;-)
so no reason to quote the Django documentation as we were aware of its implications
didn't intend to be offensive, just summarizing the current state
but this requires to make Jane compatible with Django 1.9
Totally clear, I tried to give it a shot but had to give up after working around a couple of issues. Just way too little knowledge about django and the specific setup..
For the time being, I could work around current limitations in the filtering introducing a very ugly negation of the whole where SQL part (see #21).
I can take care of this if we want to do this - it will greatly simplify the whole thing.
The only thing to keep in mind that it currently works with django 1.8 which is LTS. The json field was only introduced in django 1.9 which has a fairly short life cycle so somebody will need to take care to update it to newer django versions.
the JSONField alone is a reason to go for Django 1.9 - also I'm not a big fan of Django LTS versions as upgrading them later is a huge pain
I've started work on this.. currently this is partially hampered by django/django#6929, the regex filtering doesn't work due to that bug. The patch is super simple, maybe we can work around it somehow..
or someone figures out to either update django-plugins or get rid of it and use some other plugin system and we can just use the latest django.
or you could just monkey patch the one function I guess
This should be pretty close to being ready now.. but I'm stuck with some problem in the station index querying.. no clue why this is not working as expected:
(Pdb) for i in query: print(i.json['station'])
ALTM
ALTM
ALTM
(Pdb) query.filter(json__station__exact='ALTM')
[]
(Pdb) print(type(query))
<class 'django.contrib.gis.db.models.query.GeoQuerySet'>
Tests pass, only some flake8 fails that I don't get locally. I think this is ready for merging after a review by @krischer or @barsch.
This removes internal usage of where low-level queries (that are deprecated in newer Django) with builtin queries on the new Django builtin jsonfield.
All existing tests pass. Only one test was changed, but I think that it should be changed (also returning null elements when a specific string is excluded, see https://github.com/krischer/jane/pull/20/files#diff-90539ca4a8392804c680442de74b8ee3R396). So there's a slight change in results for exclusion type queries but I think this is closer to the expected result.
The only ugly thing about this PR is that we need to patch Django because of a bug in django <1.11, see https://github.com/krischer/jane/pull/20/commits/894824c4154e5045dacdd3cd3564d87afa69ff86. But the patching seems to work alright, so I think it might be the lesser evil.
Also, this PR should enable having nested information in the json field (i.e. dictionaries), which is currently not supported. (CC @bch0w)
Ok, so I finally found out what's going on with the comma-separated-lists station query.. I thought it never was implemented, but turns out it was, but it never was tested, so I accidentally removed it in this PR. So don't merge yet..
Also I think event queries are completely untested??
Ok, so I finally found out what's going on with the comma-separated-lists station query.. I thought it never was implemented, but turns out it was, but it never was tested, so I accidentally removed it in this PR. So don't merge yet..
Also I think event queries are completely untested??