turnilo
turnilo copied to clipboard
Do we need overrideAttributes in DataCube?
We need to investigate all common scenarios where overrideAttribute
could be used.
Feel free to comment if you use this in your case. @l2dy @alexbusu @cedrics @michalmisiewicz
Thanks for asking.
I use only the attributeOverrides
in configuration file. As described here https://github.com/allegro/turnilo/blob/master/docs/configuration.md#attribute-overrides
Can I ask you why? Do you use introspection?
If yes, what introspection doesn't understand that you must correct it with override? If no, why not just use dimension/measures object to define them in configuration?
And bonus question, how we would screw you if we drop support for attributes overrides?
Introspection was used when the first configuration file was created. The dimension data type is seen as string, and it is indeed. The issue is when the dimension is used in filters or split tiles. I always get 500 Server Error (when no attributeOverrides
is used).
I can then put kind: string
(most appropriate) or kind: number
in dimension options, but this doesn't help.
As I understand, the dimension options are used in Turnilo to format the output and parse the input data accordingly (filters with sliders, etc).
Having the dimension with kind: number
I get this chart:
Whereas my dimension numeric values should not be interpreted as a range, but a collection of IDs instead.
That's why the
kind: string
dimension config. is preferred; to get data like this:
Apart of dimension options, I guess the attributeOverrides
options are used by plywood library, when querying the Druid.
We have a similar issue for some other dimension (values are resolved with a lookup formula) when using in filters. My debugging attempts reached some JSON parsing library in Druid components. So I had to see the exact query to have a clue; but since the issue was not too sensitive and priorities shifted, I stopped there.
How would you screw me? :thinking: Probably I'll stick with the current version of Turnilo, eventually forking it having this feature in place :smile:
But why would you like to get rid of this feature in the first place?
How would you screw me? 🤔 Probably I'll stick with the current version of Turnilo, eventually forking it having this feature in place 😄
But why would you like to get rid of this feature in the first place?
We don't use it at all and see as unnecessary complexity. But posts like yours will probably change our mind. Give me some time to process it and understand :)
Here's what I got.
Since I'm using unions for data sources, and given the dimension types differences at introspection (e.g. in one data source the field is numeric, in other data source it is string (don't ask why)) it seems the attributeOverrides
can be used to resolve such conflicts.
When conflicts of this kind arise then the dimension is dropped, hence the 500 server error:
Thanks for your patience :heart:
We're using attributeOverrides
when we ingest numeric values as string in order to have index on it. It would be great if we could remove attributeOverrides
section from configuration and infer attributeOverrides
for plywood based on dimension kind.
I think main issue is here: https://github.com/allegro/turnilo/blob/master/src/common/models/data-cube/data-cube.ts#L785..L818
We don't use information from config - for example all dimensions are strings. Result of this function is required by plywood to construct queries so it is quite crucial.
We're using
attributeOverrides
when we ingest numeric values as string in order to have index on it. It would be great if we could removeattributeOverrides
section from configuration and inferattributeOverrides
for plywood based on dimension kind.
We use overrides for the same reason
We're using
attributeOverrides
when we ingest numeric values as string in order to have index on it. It would be great if we could removeattributeOverrides
section from configuration and inferattributeOverrides
for plywood based on dimension kind.
Maybe we should not mix responsibilities and keep the things as they are. One thing is what you have in Druid, and another thing is how you want to interpret the data.
That's clear, the introspection sometimes returns non-optimal results, sometimes incorrect results and Turnilo must be able to override them. The question is how to configure and implement overriding strategy in consistent and easy to understand manner?
For example:
- use "kind" not only for UI modification but also for Druid query customisation
- if "kind" is not provided, type from introspection is used
- because there is no "Boolean" type in Druid, kind "Boolean" should me mapped into Druid String type
- I don't know how to handle kind "Number" because there is more than one numeric type in Druid
- As long as there is no "kind" for measures it has to be added
- I don't know how to handle histograms and uniques, Turnilo discovers the type from Plywood formula
- Finally there is also "native" cluster (wikipedia example) without any introspection :)