Oliver Kennedy
Oliver Kennedy
Currently the error aware CSV parser is a mod of the existing Spark CSV parser. 1. There's a lot of overhead in the CSV parser for dealing with things that...
To see this bug in action: ``` mimir> load 'test/data/temperature.csv'; mimir> create lens repaired as select * from temperature with domain('temperature > -30'); mimir> feedback temperature 0 is real; mimir>...
Another use for the adaptive schemas, creating pivot tables. Consider for example: ``` time,location,temperature 1505958603,den,24.5 1505958604,basement,21.3 1505959204,den,24.5 1505959204,basement,21.400000000000002 1505959803,den,24.6 1505959804,basement,21.3 1505960265,office,17.5 1505960403,den,24.5 1505960404,basement,21.40000000000000 1505961003,den,24.5 1505961005,basement,21.400000000000002 1505961603,den,24.400000000000002 1505961604,basement,21.3 1505962203,den,24.400000000000002 ``` It...
As of right now, ANALYZE only detects sources of uncertainty injected by Mimir. It would be helpful if Mimir had some facility to do syntactic analysis on a dataset being...
`UNNEST`
It would be nice to have an UNNEST operator, essentially an inverse aggregate (one row/cell to many). Examples: * Iterate over the elements of a JSON array * Regexp matches...
Might be interesting to see if we can incorporate stuff from this system for extracting tabular PDF data: https://github.com/WZBSocialScienceCenter/pdftabextract
https://github.com/stanford-futuredata/macrobase