dataux
dataux copied to clipboard
Schema content Meta-Analysis for planner & typing (csv/json)
Understand the contents of underlying data sources via inspection to feed into planners, and schema. Deep understanding of data types, volatility, cardinality, muteability are the decisive factors in guiding good schema design, optimizations, and usage.
- create a datastore of schema info, and variables just like mysql etc. Be able to utilize, in-mem, file, or persistent store. depends on https://github.com/araddon/qlbridge/issues/32
- library utils for inspection of types in underlying csv/json sources to do type detection
- [x] scalar values (ints, strings, etc) for csv https://github.com/araddon/qlbridge/blob/159f9a5ff9a9dba83bacd35b92e8306dd7eacf96/datasource/introspect.go#L20
- [ ] detect json, protobuf inside a blob
- nested types (json)
- cardinality for planners (per column)
- muteability
- is this row muteable? read-only rows (never get updated) can be reflected into read-only analytical stores. Also, on-disk scannable
- volatility? how often does it change? For low-cardinality non-volatile columns (often enums) it might make sense to store those in memory.
- Table Info: row count, metrics (total byte size, writes/hour/day, reads/day/size, avg row size bytes).