metacrafter icon indicating copy to clipboard operation
metacrafter copied to clipboard

Add analysis of schema structure decomposition of field keys and subtypes

Open ivbeg opened this issue 1 year ago • 0 comments

Flat table datasets (CSV) files, database tables, and sometimes objects with nested objects ofter include elements that could be grouped.

For example CSV file Zaara_D.csv includes following fields: title, text, date, place, placeURL, placeLocation, placeType, reviewScore, avgScore

We could find that prefix 'place' is a subtype identifier. It could be decomposed as place:

  • Name
  • Location
  • URL
  • Type

And postfix Score identifies value type, whether integer or float.

Most data tables use case change or "_" symbol as dividers. Very rarely is the '-' symbol also used.

Detection of field groups and decomposition of field names could help with:

  • additional rules to detect semantic data types
  • automatic context identification

Add group detection to the final report as field_group property.

ivbeg avatar Aug 06 '22 08:08 ivbeg