pydqc icon indicating copy to clipboard operation
pydqc copied to clipboard

infer_schema TypeError: '<' not supported between instances of 'str' and 'int'

Open chrisgschon opened this issue 6 years ago • 3 comments

When pandas reads a column that contains a 'mix' of integers and strings its dtype will be object. But the entries themselves will be cast to int/string individually. So infer_schema falls over at np.unique to a type error.

In line 113 of infer_schema.py, changing to col_stat['sample_num_uni'] = len(np.unique(sample_data).astype(str))

might work?

Thanks

chrisgschon avatar Jun 21 '18 16:06 chrisgschon

Any update on this bug @SauceCat ?

786country avatar Jan 15 '19 21:01 786country

Hi @SauceCat ! @chrisgschon get pretty close to the answer. I managed to solve the issue changing line 113 of infer_schema.py to: col_stat['sample_num_uni'] = len(np.unique(sample_data.astype(str))) and obvioulsy the same in line 114

danielmicoski avatar May 14 '19 15:05 danielmicoski

Thanks! Sorry I currently couldn’t arrange any spare time on this project. It would be great if you create a PR and help other people :)

On Tue, 14 May 2019 at 11:57 PM, Daniel Micoski [email protected] wrote:

Hi @SauceCat https://github.com/SauceCat ! Chrisgchon get pretty close to the answer. I managed to solve the issue changing line 113 of infer_schema.py to: col_stat['sample_num_uni'] = len(np.unique(sample_data.astype(str)))

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/SauceCat/pydqc/issues/8?email_source=notifications&email_token=ADXNPFGNTRQJH6UE7ZRSQXLPVLOP7A5CNFSM4FGJWTO2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVL6LWI#issuecomment-492299737, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXNPFFFONIP2D2QXGD7EGLPVLOP7ANCNFSM4FGJWTOQ .

SauceCat avatar May 14 '19 16:05 SauceCat