metacrafter
metacrafter copied to clipboard
Add extended reporting
Right now report include only: field name, data type, tags, semantic type id and registry URL. Sometimes additional information required and it's collected during matching process.
Consider to add to report following data (already collected):
- [x] number of unique values
- [x] share of unique values
- [x] minimal length
- [x] max length
- [x] average length
- [ ] minimal value
- [ ] maximum value
Consider to add and to collect following info:
- [x] has alphas
- [x] has digits
- [x] has special chars
If possible, add following:
- [ ]
reconstucted regexp- regular expression reconstucted from data sample - [ ]
named entities- named entities extracted by one of named entities detection tools like Microsoft Presidio or Slovnet or others