cql-engine icon indicating copy to clipboard operation
cql-engine copied to clipboard

Make Engine validate and correctly handle its own direct inputs

Open duncand opened this issue 7 years ago • 1 comments

The CQL reference implementations are structured of component projects where some of them are kept at arms length intentionally. The CQL Translator and the CQL Engine in particular are kept mutually independent, and we explicitly restrict the Engine itself from depending on the Translator, and vice-versa. Between these two is a single defined common interface which is ELM character strings, that the Translator emits and the Engine consumes/evaluates/performs.

While the Translator and Engine are typically used together, such that the Engine's input is typically just what the Translator emits, that isn't always the case, and it definitely isn't required to be the case.

ELM is a legitimate language of its own, and can be written manually, or can be sourced from various means that are not output by the Translator.

In order for the CQL Engine to legitimately be considered an independent project not dependent on the Translator, it needs to be seriously hardened such that it doesn't trust the well-formedness or correctness of ANY input received from outside itself.

Regarding whatever compile time checks the Java compiler itself doesn't do, for example where any "public" Java routines are defined to accept "Object", or otherwise regarding the ELM parser, the Engine routines need to explicitly validate their arguments and correctly handle every possible input that the Java compiler/runtime allows them to be given.

This is particularly relevant in the implementations of CQL operators, the set of FooEvaluator.java files, each of which in the general case is polymorphic and accepts a variety of argument type combinations. These need to be coded as if the ELM was simply well-formed and did not have any prior type checks at all.

The current polymorphic operators do argument type checks in order to dispatch which of several valid input types to implement for. But in most cases, citing Equivalent() and Equal() as examples, they do not test that both arguments are the same type; they just assume that if the left argument has a particular type then the right argument is the same type. There are some exceptions, such as that those 2 routines DO test both arguments when the left is DateTime or Time.

The routines should be updated to test that both/all arguments are of the expected type(s) and not just one argument.

This is not just about alternate ELM sources. For one thing, even the Translator itself can let things through that the Engine may not expect, whether due to bugs or due to declared CQL type signatures being generic.

Another huge source of unexpected input is external functions. These can result in values of any possible Java type, including ones that don't correspond to the CQL Engine's representation formats for CQL/ELM values.

For example, an external function could return Java Date values, which are not the native representation of CQL/ELM DateTime values and so using Equivalent() on either them or any List or Tuple or other collections having elements of such, results in null rather than true or false.

The task of this ticket/issue is to perform this hardening of the CQL Engine, particularly the FooEvaluator.java files, so that they explicitly (rather than either accidentally or not at all) give predictable and appropriate results for all possible inputs Java can give them. Typically this means using a lot more "instanceof" tests (and "instanceof" gives false for any null left argument, so explicitly testing for null first often isn't necessary as "instanceof" won't throw an exception on getting one).

A tangential matter is that it would be ideal if the CQL Translator were itself split into 2 main parts existing at arms length. The first part simply translates well-formed CQL into well-formed ELM, and does the absolute minimum of type checking or reference validation needed to accomplish this; all operator calls effectively have signatures of "Any" for every argument. The result of the first part alone could be used as input to the CQL Engine. The second part of the CQL Translator split would perform all the type checking and reference validation, and that part takes ELM input and results in true or false if it passes the extra checks; it might also optimize the ELM into more type-specific operator calls if applicable. The only thing the second part does is cause user errors to be caught earlier, but a correct program will behave the same either way. One benefit of this split is separation of concerns. Another benefit of the split is that it is easier to workaround the multitude of Translator bugs that could or do exist, which are almost entirely in the type checking part, and typically result in correct CQL failing translation even though it would make valid ELM and it could be executed correctly by an Engine. That also means the Translator doesn't have to be a source of blocks for adding features to the Engine, such as the logical "implies" operator. This also means that it would be much easier for a comprehensive test suite to test the Engine without the Validator getting in the way. This tangent is being mentioned for information sake and isn't part of satisfying the current ticket/issue, but it does raise another context that the Engine should be able work under.

duncand avatar Feb 25 '18 04:02 duncand

New issue #73 (Make external function interface validate result is a CQL Any value) addresses a specific low hanging fruit of issue #69; #73 gives a large impact, maybe a 50% solution, with very little time and effort, probably handling the most critical short term impact. Some people may also consider that the rest of #73 outside of that isn't actually a problem, and if that's the case, then there's a new focus or simpler goal.

duncand avatar Feb 26 '18 01:02 duncand