maha icon indicating copy to clipboard operation
maha copied to clipboard

OR filter support for all engines

Open pranavbhole opened this issue 6 years ago • 8 comments

pranavbhole avatar Jan 18 '19 01:01 pranavbhole

OrFilter concept query:

"fields":
  [
    { "field": "Homeroom Teacher Name" },
    { "field": "Student ID" },
    { "field": "Class ID" },
    { "field": "Class Name" },
    { "field": "Grade" }
  ],
"filters": [
  { "operator": "=", "fieldName": "Homeroom Teacher Name", "value": "Ms. Olsen" },
  { "operator": "OR", 
    "filters": 
    [
      {"operator": "=", "fieldName": "Student ID", "value": "12345"},
      {"operator": ">", "fieldName": "Grade", "value": "90"},
      {"operator": "<", "fieldName": "Number of Classes", "value": "4"}
    ]
  }
] 

The concept behind this FilterOperation is that it will become a new type of FilterOperation, called CompoundFilterOperation or CombinedFilterOperation.

Rather than having a concept of 'from' and 'to' or 'compareTo', it will contain an operator (or) and a set of input Filters to render.

In this way, assuming we independently decide to implement a new type of nested combined filter (such as AND), that new type can be passed within the list.

This new filter will be phased in as such:

  1. Create and implement CombinedFilter (or CombinedForcedFilter, as is being done with MultiField) without any renderers, only testing the operation itself directly.
  2. Implement OrFilter as a type of CombinedFilter (or ForcedFilter) on OracleEngine.
  3. Implement OrFilter on Hive & Presto Engines.
  4. Implement OrFilter on DruidEngine.

This phased development should serve to simplify the work being done & the review process such that errors can't slip in, as with large requests.

ryankwagner avatar Feb 12 '19 19:02 ryankwagner

Good use case, just few additions to design. CompoundFilterOperation or CombinedFilterOperation filter list should support all the existing filters. Input Reporting Request looks good to me.

pranavbhole avatar Feb 12 '19 20:02 pranavbhole

At the moment, the design looks like it should be able to support all existing, as well as nesting on itself for future usage that is to say: ( = OR = OR ( = AND =) OR =)

As well as any existing FilterOperations. It should be entirely agnostic of what sort of FilterOperation passes into it.

ryankwagner avatar Feb 12 '19 20:02 ryankwagner

The current implementation assumes all items in the filter list are AND'd. We would want to keep the existing functionality intact. However, the user has the option to use logical operators as necessary (AND, OR, NOT).

Here are some scenarios we should cover:

  1. FilterA AND FilterB AND FilterC
    • supported using existing behavior of providing a list of filters
    • supported using an AND logical operator and providing the list of filters to the operator
  2. FilterA OR FilterB AND FilterC
    • supported using an OR logical operator with the list of filters being FilterA and an AND logical operator, e.g. OrFilter(List(FilterA, AndFilter(List(FilterB, FilterC))))
  3. FilterA AND (FilterB OR FilterC)
    • supported using an AND logical operator with the list of filters being FilterA and an OR logical operator, e.g. AndFilter(List(FilterA, OrFilter(List(FilterB, FilterC))))
  4. FilterA AND (FilterB OR (NOT FilterC))
    • supported using an AND logical operator with the list of filters being FilterA and an OR logical operator, e.g. AndFilter(List(FilterA, OrFilter(List(FilterB, NotFilter(FilterC)))))

patelh avatar Feb 12 '19 21:02 patelh

I understand the premise of making this a larger change that allows for top-level AND, OR, NOT however that sort of change would be quite large. It would mean breaking default behavior to allow for top-level OR, AND, NOT.

The proposal here is backwards compatible by giving this new OR filter the same presedence as existing operators. In this case,

{ "operator": "OR", 
    "filters": 
    [
      {"operator": "=", "fieldName": "Student ID", "value": "12345"},
      {"operator": ">", "fieldName": "Grade", "value": "90"},
      {"operator": "<", "fieldName": "Number of Classes", "value": "4"}
    ]
  }

Using strictly this operator on all fields would generate a top-level OR operation without any further top-level filters. Likewise, AND and NOT can be created with the same functionality in mind. As nesting is a possibility, a stretch functionality would look something like:

"filters": [
  {"operator": "OR",
    "filters": [
      {"operator": "AND",
        "filters": [
          {"field": "Student Name", "operator": "=", "value": "Bob"},
          {"field": "Student ID", "operator": "<", "value": "100}
        ]
      },
      {"field": "Class Name", "operator": "<>", "value": "Biology 100"}
    ]
  }
]
((('Student Name' == 'Bob') AND ('Student ID' < 100)) 
OR 
 ('Class Name' NOT 'Biology 100'))

ryankwagner avatar Feb 12 '19 22:02 ryankwagner

@ryankwagner What you are proposing is the same, top level filter which is a logical operator. That's what I also said, just that the existing functionality of not having top level logical operator be preserved. E.g. if top level filter is not a logical operator, then you can just assume all filters are AND'd together, e.g. by default produce AndFilter(filters)...

patelh avatar Feb 12 '19 22:02 patelh

PR #415 outlines the steps that will be taken in OrFilter support more concretely.

This issue can't be cleanly solved in one Pull Request, and will likely take place over 5. Please add comments/suggested edits on this first step, as it will be the basis for future filter work.

ryankwagner avatar Feb 22 '19 02:02 ryankwagner

TODO List:

  • [x] Add OrFilter Operation & rename Pre Rendered Filter Operations. ( PR #415 )
  • [ ] Add helper functions inside RequestModel & QueryGenerator for multi-type filter handling. This includes Filter Field validation (check for dim/fact column existence & throw proper error on duplicates), as well as filter rendering into where or having clauses (and analog in Druid). ( In Progress )
  • [ ] Implement Helper functions into RequestModel & Query Generator Engines, replacing existing "filter.field" functionality.
  • [ ] Formally add OrFilter support into Query Engines.

Future TODOs (Features out of scope):

  • [ ] Add support for nesting AND/OR filters.
  • [ ] Add AND/OR filter types to Cost function.

ryankwagner avatar Feb 22 '19 18:02 ryankwagner