metacrafter icon indicating copy to clipboard operation
metacrafter copied to clipboard

Metadata and data identification tool and Python library. Identifies PII, common identifiers, language specific identifiers. Fully customizable and flexible rules

Results 19 metacrafter issues
Sort by recently updated
recently updated
newest added

Without cache, tool reloads rules on each run. It makes it harder to process thousands of datasets from the command line.

enhancement

Some fields of databases are just incremental unique identifiers generated by the database engine. They can't be linked with any external identifier databases and are used only locally by databases....

enhancement

Sometimes exported CSV files include whitespace before or after values for clearer formatting and fitting into fixed space data fields. White space should be removed automatically using the `strip()` function....

enhancement

Flat table datasets (CSV) files, database tables, and sometimes objects with nested objects ofter include elements that could be grouped. For example CSV file [Zaara_D.csv](https://github.com/apicrafter/metacrafter/files/9274615/Zaara_D.csv) includes following fields: title, text,...

enhancement

Nested documents in JSON/JSONlines/XML and e.t.c detected as str objects instead of dict or array objects. Example: nested objects `Scores` and `Geocode` detected as strings. - [ ] implement detection...

bug

Named entity recognitions technology helps to identify named objects inside texts. **Strong** - allows to identify objects inside text blobs - could allow to support more named entities (identifiers) **Weakness**...

enhancement

Thank you for open-sourcing this handy tool! I was trying to install the package from pip and source, but neither works out-of-the-box. From my end (Ubuntu with Python 3.10), running...

bug

Thank you for open-sourcing this package! I was wondering if the following behavior is expected when running `metacrafter scan-file --format short world+City.csv `: > Processing file /data/bird_sql/train_csv/world+City.csv > > 2024-07-03...

bug