datawave
datawave copied to clipboard
query validation endpoint
Currently we have a plan endpoint that syntactically validates the query, however there are many ways a user can create a valid query that is most obviously not what they had intended. We need an end point where we can configure rules that find potential query issues and return text describing what those may be.
Here is a list if issues with examples that we should be able to detect. Most all of these based on LUCENE queries:
- Parens before field or completely missing parens when using ORs FIELD:1234 OR 5678 -> should be FIELD:(1234 OR 5678) (FIELD:1234 OR 5678) -> should be FIELD:(1243 OR 5687)
- Phrases not quoted (where we would automatically AND) anywhere that the boolean operator is missing FIELD:term1 term2 -> should be FIELD:"term1 term2"
- Index only fields used in a #INCLUDE or #EXCLUDE
- Fields that do not exist in the system
- A configurable set of rules: If you see field X, then output message Y If you see pattern X (regex), then output message Y
- Missing arguments to #INCLUDE or #EXCLUDE when the first argument is a boolean: #INCLUDE(OR,value) -> should be #INCLUDE(OR, FIELD, value)
- Usage of NOT without using parens (NOT takes precidence so the resulting plan could be quite incorrect) FIELD1:abc OR FIELD2:def NOT FIELD3:123 -> should be (FIELD1:abc OR FIELD2:def) NOT FIELD3:123
- #TIME functions used with fields that are not typed as a date
- Unfielded terms in general should probably be flagged as potential issues
- Wildcards inside of phrases FIELD:"abc*def" -> either the * is incorrect or they meant FIELD:/abc*def/
- Using the wrong quote (` instead of ')
- Proximity wher the number is smaller than the number of terms FIELD:"term1 term2 term3"~1 -> the 1 should be 3 or greater
- Not escaping special characters FIELD:abc:def -> should be FIELD:abc\:def
- Searching for a range of numbers against a non-numeric field