plywood
plywood copied to clipboard
Granularity builders should support origin parameter
Description
Queries should support origin
in granularity builders. This is defined via http://druid.io/docs/latest/querying/granularities.html. Right now the response from Druid starts from the nearest year, month, or week depending on the granularity. It should start from the query's start time.
Use Case
- Users with time range filters on their UI want to make an area chart from the user's selected start time to their end time, with a defined granularity.
@vogievetsky
According to http://druid.io/docs/latest/querying/granularities.html, the origin
parameter only applies to granularities of type: duration
and type: period
. Since Plywood doesn't seem to support type: duration
, I looked for all places using type: period
. These include:
https://github.com/implydata/plywood/blob/c767f76b73987abb2eabf9020b6b9ad84cadc5b6/src/external/druidExternal.ts#L521-L525
https://github.com/implydata/plywood/blob/c767f76b73987abb2eabf9020b6b9ad84cadc5b6/src/external/utils/druidExtractionFnBuilder.ts#L315-L321
These need to be transformed to add the origin
parameter in the granularity
object. However, 2 problems arise with this.
- We do not have access to the time filter in these areas.
- We do not know if the applied time filter only applies to 1 time interval. Since technically Druid supports multiple time intervals in the same query, we might not know which one to pick.
Luckily, both of the places using type: period
eventually end up in https://github.com/implydata/plywood/blob/c767f76b73987abb2eabf9020b6b9ad84cadc5b6/src/external/druidExternal.ts#L827 which does have access to the time filter.
What I'm thinking is to do the following
- Allow a
setGranularityOrigin
flag in https://github.com/implydata/plywood/blob/c767f76b73987abb2eabf9020b6b9ad84cadc5b6/src/external/druidExternal.ts#L112 - If there is exactly 1 time filter present and
setGranularityOrigin
is true, run through the query objects which might contain agranularity
object, and set theorigin
parameter in those to the time filter's start time.
I'm thinking that it's not an unreasonable idea to restrict this to only apply to cases where there is exactly 1 time filter, since I don't think it would be possible with 2+ time filters.
The only other way I could see this being supported is an optional origin
parameter in https://github.com/implydata/plywood/blob/c767f76b73987abb2eabf9020b6b9ad84cadc5b6/src/expressions/baseExpression.ts#L1288-L1292
However, since this is a Druid-specific option, it smells pretty funky to go with that route. It would give more control to the programmer in case they did want to do something like multiple time filters, and would also be more transparent.
Which approach are you more comfortable with? Do you have a better way to handle this?
@robertervin were you able to solve this problem ?
I am not 100% sure if the origin parameter is meant this way. From my perspective it serves the purpose of defining a different starting point for time bucketing, e.g. for specifying a diverting fiscal year. As said in the documentation https://druid.apache.org/docs/latest/querying/granularities.html
By default, years start on the first of January, months start on the first of the month and weeks start on Mondays unless an origin is specified.
I 100% agree that it would be great to be able to specify an origin different from the normal calendar year but from my perspective it should not automatically match the beginning of the time filter. One should be able to specify it separately.
A common use case for this, as already mentioned, is a fiscal year that does not start at the first of January. I think pretty often the beginning of the time filter will match the origin. But this does not have to be this way and it was not the intention of this parameter in the first place. Therefore I suggest to add this parameter to plywood but not to link it to the beginning of the time filter.