Ivan Begtin
Ivan Begtin
Support XML files with following list of tasks: - [x] Support XML files with XML tag name provided - [x] Add examples to documentation - [ ] Collect examples with...
Automate detection of empty values and exclude them from data analysis. Possible empty values: None, 'N/A', empty string, 'NaN', 'None', '-' The following actions are required: - [ ] Add...
Without cache, tool reloads rules on each run. It makes it harder to process thousands of datasets from the command line.
Some fields of databases are just incremental unique identifiers generated by the database engine. They can't be linked with any external identifier databases and are used only locally by databases....
Sometimes exported CSV files include whitespace before or after values for clearer formatting and fitting into fixed space data fields. White space should be removed automatically using the `strip()` function....
Flat table datasets (CSV) files, database tables, and sometimes objects with nested objects ofter include elements that could be grouped. For example CSV file [Zaara_D.csv](https://github.com/apicrafter/metacrafter/files/9274615/Zaara_D.csv) includes following fields: title, text,...
Nested documents in JSON/JSONlines/XML and e.t.c detected as str objects instead of dict or array objects. Example: nested objects `Scores` and `Geocode` detected as strings. - [ ] implement detection...
Named entity recognitions technology helps to identify named objects inside texts. **Strong** - allows to identify objects inside text blobs - could allow to support more named entities (identifiers) **Weakness**...
title: Russian weather by radiometric analysis dataformat: IMG datapublisher: State enterprise "Central Aerological Observatory" dataurl: http://www.nowcast.ru/ , http://www.nowcast.ru/data/uvk.html author: Ivan Begtin authorurl: http://infoculture.ru What's bad? 1. Data or documents unavailable....