autonormalize icon indicating copy to clipboard operation
autonormalize copied to clipboard

python library for automated dataset normalization

Results 21 autonormalize issues
Sort by recently updated
recently updated
newest added

This PR fixes two issues that were identified when trying to replicate the error described in Issue #19 The first change addresses a problem that resulted by trying to do...

Checks whether a table is normalized, returning true or false. - finds functional dependencies in the data - sees if there are any partial or transitive dependencies

enhancement

Implement FDEP algorithm, which is more efficient for tables with many columns. Which algorithm is used is then determined by the dimensions of the dataset provided. Scientific paper: https://www.lri.fr/~pierres/donn%E9es/save/these/articles/lpr-queue/database-dependency-discovery.pdf

enhancement

AutoNormalize should be available for download via conda-forge. Documentation on how to contribute a package: https://conda-forge.org/docs/maintainer/adding_pkgs.html Example PR of contributing a package: https://github.com/conda-forge/staged-recipes/pull/16033

- We should add a `release.md` file that follows our other libraries - https://github.com/alteryx/woodwork/blob/main/release.md - https://github.com/alteryx/featuretools/blob/main/release.md - Some parts of this will apply to `autonormalize` and some parts will not

If the input dataframe has been initialized with logical types, does autonormalize lose this typing information on the output dataframes? ## Steps to check 1. Create an input dataframe 2....

- Adding variable types as parameters to auto_entityset, make_entityset - Test in tests/test_normalize - Updated README - Resolves #10

- Using this [dataset.csv.zip](https://github.com/FeatureLabs/autonormalize/files/4418093/dataset.csv.zip) ```python import pandas as pd import autonormalize as an data = pd.read_csv('dataset.csv.zip') es = an.auto_entityset(data, name="fraud", index='id', time_index='datetime') ``` Results in the following error: ``` ValueError:...

Unable to add relationship because LotArea_LandContour in LotArea_LandContour is Pandas dtype int32 and LotArea_LandContour in index is Pandas dtype int64.

Full trace of install (I ran `pip3 uninstall featuretools` and `pip3 uninstall autonormalize` beforehand just to be sure): ``` maxpagels@Maxs-MacBook-Pro:~$ pip3 install featuretools[autonormalize] Collecting featuretools[autonormalize] Using cached https://files.pythonhosted.org/packages/a2/4c/79f3ad4ad7bc5c195529969b4aa989223a7d211d727366fde5badb741aad/featuretools-0.10.1-py3-none-any.whl Requirement already...