pydqc
pydqc copied to clipboard
infer_schema TypeError: '<' not supported between instances of 'str' and 'int'
When pandas reads a column that contains a 'mix' of integers and strings its dtype will be object. But the entries themselves will be cast to int/string individually. So infer_schema falls over at np.unique to a type error.
In line 113 of infer_schema.py, changing to col_stat['sample_num_uni'] = len(np.unique(sample_data).astype(str))
might work?
Thanks
Any update on this bug @SauceCat ?
Hi @SauceCat !
@chrisgschon get pretty close to the answer. I managed to solve the issue changing line 113 of infer_schema.py to:
col_stat['sample_num_uni'] = len(np.unique(sample_data.astype(str)))
and obvioulsy the same in line 114
Thanks! Sorry I currently couldn’t arrange any spare time on this project. It would be great if you create a PR and help other people :)
On Tue, 14 May 2019 at 11:57 PM, Daniel Micoski [email protected] wrote:
Hi @SauceCat https://github.com/SauceCat ! Chrisgchon get pretty close to the answer. I managed to solve the issue changing line 113 of infer_schema.py to: col_stat['sample_num_uni'] = len(np.unique(sample_data.astype(str)))
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SauceCat/pydqc/issues/8?email_source=notifications&email_token=ADXNPFGNTRQJH6UE7ZRSQXLPVLOP7A5CNFSM4FGJWTO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVL6LWI#issuecomment-492299737, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXNPFFFONIP2D2QXGD7EGLPVLOP7ANCNFSM4FGJWTOQ .