turnilo icon indicating copy to clipboard operation
turnilo copied to clipboard

Do we need overrideAttributes in DataCube?

Open adrianmroz opened this issue 5 years ago • 10 comments

We need to investigate all common scenarios where overrideAttribute could be used.

Feel free to comment if you use this in your case. @l2dy @alexbusu @cedrics @michalmisiewicz

adrianmroz avatar Oct 29 '19 10:10 adrianmroz

Thanks for asking. I use only the attributeOverrides in configuration file. As described here https://github.com/allegro/turnilo/blob/master/docs/configuration.md#attribute-overrides

alexbusu avatar Oct 29 '19 14:10 alexbusu

Can I ask you why? Do you use introspection?

If yes, what introspection doesn't understand that you must correct it with override? If no, why not just use dimension/measures object to define them in configuration?

And bonus question, how we would screw you if we drop support for attributes overrides?

adrianmroz avatar Oct 29 '19 14:10 adrianmroz

Introspection was used when the first configuration file was created. The dimension data type is seen as string, and it is indeed. The issue is when the dimension is used in filters or split tiles. I always get 500 Server Error (when no attributeOverrides is used). image image

I can then put kind: string (most appropriate) or kind: number in dimension options, but this doesn't help. As I understand, the dimension options are used in Turnilo to format the output and parse the input data accordingly (filters with sliders, etc). Having the dimension with kind: number image I get this chart: image Whereas my dimension numeric values should not be interpreted as a range, but a collection of IDs instead. That's why the kind: string dimension config. is preferred; to get data like this: image

Apart of dimension options, I guess the attributeOverrides options are used by plywood library, when querying the Druid.

We have a similar issue for some other dimension (values are resolved with a lookup formula) when using in filters. My debugging attempts reached some JSON parsing library in Druid components. So I had to see the exact query to have a clue; but since the issue was not too sensitive and priorities shifted, I stopped there.

How would you screw me? :thinking: Probably I'll stick with the current version of Turnilo, eventually forking it having this feature in place :smile:

But why would you like to get rid of this feature in the first place?

alexbusu avatar Oct 29 '19 15:10 alexbusu

How would you screw me? 🤔 Probably I'll stick with the current version of Turnilo, eventually forking it having this feature in place 😄

But why would you like to get rid of this feature in the first place?

We don't use it at all and see as unnecessary complexity. But posts like yours will probably change our mind. Give me some time to process it and understand :)

adrianmroz avatar Oct 29 '19 15:10 adrianmroz

Here's what I got. image

Since I'm using unions for data sources, and given the dimension types differences at introspection (e.g. in one data source the field is numeric, in other data source it is string (don't ask why)) it seems the attributeOverrides can be used to resolve such conflicts. When conflicts of this kind arise then the dimension is dropped, hence the 500 server error: image

Thanks for your patience :heart:

alexbusu avatar Oct 29 '19 15:10 alexbusu

We're using attributeOverrides when we ingest numeric values as string in order to have index on it. It would be great if we could remove attributeOverrides section from configuration and infer attributeOverrides for plywood based on dimension kind.

michalmisiewicz avatar Oct 30 '19 08:10 michalmisiewicz

I think main issue is here: https://github.com/allegro/turnilo/blob/master/src/common/models/data-cube/data-cube.ts#L785..L818

We don't use information from config - for example all dimensions are strings. Result of this function is required by plywood to construct queries so it is quite crucial.

adrianmroz avatar Oct 30 '19 14:10 adrianmroz

We're using attributeOverrides when we ingest numeric values as string in order to have index on it. It would be great if we could remove attributeOverrides section from configuration and infer attributeOverrides for plywood based on dimension kind.

We use overrides for the same reason

cedrics avatar Oct 31 '19 12:10 cedrics

We're using attributeOverrides when we ingest numeric values as string in order to have index on it. It would be great if we could remove attributeOverrides section from configuration and infer attributeOverrides for plywood based on dimension kind.

Maybe we should not mix responsibilities and keep the things as they are. One thing is what you have in Druid, and another thing is how you want to interpret the data.

alexbusu avatar Oct 31 '19 16:10 alexbusu

That's clear, the introspection sometimes returns non-optimal results, sometimes incorrect results and Turnilo must be able to override them. The question is how to configure and implement overriding strategy in consistent and easy to understand manner?

For example:

  • use "kind" not only for UI modification but also for Druid query customisation
  • if "kind" is not provided, type from introspection is used
  • because there is no "Boolean" type in Druid, kind "Boolean" should me mapped into Druid String type
  • I don't know how to handle kind "Number" because there is more than one numeric type in Druid
  • As long as there is no "kind" for measures it has to be added
  • I don't know how to handle histograms and uniques, Turnilo discovers the type from Plywood formula
  • Finally there is also "native" cluster (wikipedia example) without any introspection :)

mkuthan avatar Nov 08 '19 15:11 mkuthan