metacrafter
metacrafter copied to clipboard
Add analysis of schema structure decomposition of field keys and subtypes
Flat table datasets (CSV) files, database tables, and sometimes objects with nested objects ofter include elements that could be grouped.
For example CSV file Zaara_D.csv includes following fields: title, text, date, place, placeURL, placeLocation, placeType, reviewScore, avgScore
We could find that prefix 'place' is a subtype identifier. It could be decomposed as place:
- Name
- Location
- URL
- Type
And postfix Score identifies value type, whether integer or float.
Most data tables use case change or "_" symbol as dividers. Very rarely is the '-' symbol also used.
Detection of field groups and decomposition of field names could help with:
- additional rules to detect semantic data types
- automatic context identification
Add group detection to the final report as field_group
property.