ratatool
ratatool copied to clipboard
[WIP] Descriptive statistics
Hackday project in need of feedback.
Working with data you're interested in:
- Shape of data, which could be your schema
- General case, which is addressed by bigSampler
- Edge cases, which this PR tries to tackle
It's inspired by summary
from R.
Todo:
- [ ] Property and unit tests
- [ ] Support booleans
- [ ] Support floating point numbers
- [ ] Support for different formats (protobuf)?
Hey, thanks for taking the initiative! We have some internal stuff that overlaps a bit. I've been thinking a lot about the future of that and might be good to make sure we're on the same page.
@idreeskhan may I ask, were data profiling tools opensourced since then?
Sorry this comment got lost in email while I was on vacation back in June. They have not been open sourced but we are hesitant to merge this in. Internally the data profiling tools fit our needs and if we merge this it means we are taking over maintenance which we don't really want to do at the moment.