metacrafter icon indicating copy to clipboard operation
metacrafter copied to clipboard

Add extended reporting

Open ivbeg opened this issue 3 years ago • 0 comments

Right now report include only: field name, data type, tags, semantic type id and registry URL. Sometimes additional information required and it's collected during matching process.

Consider to add to report following data (already collected):

  • [x] number of unique values
  • [x] share of unique values
  • [x] minimal length
  • [x] max length
  • [x] average length
  • [ ] minimal value
  • [ ] maximum value

Consider to add and to collect following info:

  • [x] has alphas
  • [x] has digits
  • [x] has special chars

If possible, add following:

  • [ ] reconstucted regexp - regular expression reconstucted from data sample
  • [ ] named entities - named entities extracted by one of named entities detection tools like Microsoft Presidio or Slovnet or others

ivbeg avatar Aug 05 '22 16:08 ivbeg