Ivan Begtin

Results 195 issues of Ivan Begtin

Support XML files with following list of tasks: - [x] Support XML files with XML tag name provided - [x] Add examples to documentation - [ ] Collect examples with...

enhancement

Automate detection of empty values and exclude them from data analysis. Possible empty values: None, 'N/A', empty string, 'NaN', 'None', '-' The following actions are required: - [ ] Add...

enhancement

Without cache, tool reloads rules on each run. It makes it harder to process thousands of datasets from the command line.

enhancement

Some fields of databases are just incremental unique identifiers generated by the database engine. They can't be linked with any external identifier databases and are used only locally by databases....

enhancement

Sometimes exported CSV files include whitespace before or after values for clearer formatting and fitting into fixed space data fields. White space should be removed automatically using the `strip()` function....

enhancement

Flat table datasets (CSV) files, database tables, and sometimes objects with nested objects ofter include elements that could be grouped. For example CSV file [Zaara_D.csv](https://github.com/apicrafter/metacrafter/files/9274615/Zaara_D.csv) includes following fields: title, text,...

enhancement

Nested documents in JSON/JSONlines/XML and e.t.c detected as str objects instead of dict or array objects. Example: nested objects `Scores` and `Geocode` detected as strings. - [ ] implement detection...

bug

Named entity recognitions technology helps to identify named objects inside texts. **Strong** - allows to identify objects inside text blobs - could allow to support more named entities (identifiers) **Weakness**...

enhancement

No one test exists right now, add it.

bug

title: Russian weather by radiometric analysis dataformat: IMG datapublisher: State enterprise "Central Aerological Observatory" dataurl: http://www.nowcast.ru/ , http://www.nowcast.ru/data/uvk.html author: Ivan Begtin authorurl: http://infoculture.ru What's bad? 1. Data or documents unavailable....