plywood
plywood copied to clipboard
Subquery Equality Filtering on Computed Column Failing
Plywood.ply()
.apply("my_datasource", $("my_datasource")
.filter(
$("timestamp").in({
start: new Date("2018-01-01"),
end: new Date("2018-02-16")
})
))
.apply('visitorTypes', $("my_datasource")
.split({ UserId: '$user__id' })
.apply('user__is_new', $("my_datasource").max('$user__is_first_session')))
.apply('data', $("my_datasource")
.filter($('user__id').in(
$('visitorTypes').filter('$user__is_new == 0').collect($('UserId'))
))
.count())
Created a POST request of
{
"method":"POST",
"url":"https://example.com/druid/v2/",
"body":{
"queryType":"timeseries",
"dataSource":"my_datasource",
"intervals":"2018-01-01T00Z/2018-02-18T19:35:30.768Z",
"granularity":"all",
"context":{
"timeout":10000
},
"filter":{
"type":"or",
"fields":[
]
},
"aggregations":[
{
"name":"__VALUE__",
"type":"count"
}
]
},
"headers":{
"Content-type":"application/json"
}
}
Which makes Druid throw
Error: Unknown exception: Instantiation of [simple type, class io.druid.query.filter.OrDimFilter] value failed: OR operator requires at least one field (through reference chain: io.druid.query.filter.OrDimFilter["fields"])
Since the dataset $user__is_new
is a member of is in-memory, Plywood should filter in-memory instead of passing through to Druid.
Also of note, if I use a quantile filter instead of an equality filter (e.g.
.filter($('ad__id').in(
$('adsByCTR').filter('$ad__ctr <= $adsByCTR.quantile($ad__ctr, 0.25)').collect($('AdId'))
))
it works as expected.