pydqc icon indicating copy to clipboard operation
pydqc copied to clipboard

python automatic data quality check toolkit

Results 23 pydqc issues
Sort by recently updated
recently updated
newest added

I was getting an error `ModuleNotFoundError: No module named 'sklearn.externals.joblib'`, which after some googling[1] shows that sklearn removed that from the externals lib, and expects it to just be at...

When running: `from pydqc import infer_schema, data_summary` We get an error: `ModuleNotFoundError: No module named 'sklearn.externals.joblib'` because this is no longer in scikit. Fixed by specifying the version of scikit-learn...

Hi There, This looks like a great package and I was testing your package for my own automation. So, I have created the schema of my dataframe with the infer_schema()...

bug

### Cannot convert -#.##### to Excel Here is an error log trying to extract data summary - seems like the spreadsheet writer is unable to fill float values ``` ---------------------------------------------------------------------------...

I was able to run and modify the output of the infer_schema function. However, when running the data summary function I keep getting the error str indices must be int....

in `def _compare_key(key, _df1, _df2, img_dir):` and `def _compare_numeric(col, _df1, _df2, img_dir, date_flag=False):` the labels `df1_name` & `df2_name` are not defined and consequently it crashed during data comparision

I think closing the files, would be most interesting. It will save if you let run `pydqc` several times.

`Selected KDE bandwidth is 0. Cannot estimate density.` runtime error is thrown when trying to plot the kde of data, with the below characteristics: `default_df['overdue_accts'].value_counts()` > 0 43408 > 1...