scorecardpy icon indicating copy to clipboard operation
scorecardpy copied to clipboard

MergeError when executing "woebin" function

Open kendalvictor opened this issue 4 years ago • 9 comments

Hi, image few days ago after updating the PANDAS library to version 1.2.0, the "woebin" function of scorerapy version '0.1.9.2' stopped working.

When trying to execute it, the error is seen:


MergeError Traceback (most recent call last) in ----> 1 cortes = sc.woebin( 2 data[ 3 (data[col_target].notnull()) 4 ].drop( 5 [col for col in data.columns if 'target' in col and col != col_target] + col_no_review,

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin(dt, y, x, var_skip, breaks_list, special_values, stop_limit, count_distr_limit, bin_num_limit, positive, no_cores, print_step, method, ignore_const_cols, ignore_datetime_cols, check_cate_num, replace_blank, save_breaks_list, **kwargs) 956 print(('{:'+str(len(str(xs_len)))+'.0f}/{} {}').format(i, xs_len, x_i), flush=True) 957 # woebining on one variable --> 958 bins[x_i] = woebin2( 959 dtm = pd.DataFrame({'y':dt[y], 'variable':x_i, 'value':dt[x_i]}), 960 breaks=breaks_list[x_i] if (breaks_list is not None) and (x_i in breaks_list.keys()) else None,

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2(dtm, breaks, spl_val, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, method) 720 if method == 'tree': 721 # 2.tree-like optimal binning --> 722 bin_list = woebin2_tree( 723 dtm, init_count_distr=init_count_distr, count_distr_limit=count_distr_limit, 724 stop_limit=stop_limit, bin_num_limit=bin_num_limit, breaks=breaks, spl_val=spl_val)

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2_tree(dtm, init_count_distr, count_distr_limit, stop_limit, bin_num_limit, breaks, spl_val) 482 ''' 483 # initial binning --> 484 bin_list = woebin2_init_bin(dtm, init_count_distr=init_count_distr, breaks=breaks, spl_val=spl_val) 485 initial_binning = bin_list['initial_binning'] 486 binning_sv = bin_list['binning_sv']

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in woebin2_init_bin(dtm, init_count_distr, breaks, spl_val) 274 275 # dtm $ binning_sv --> 276 dtm_binsv_list = dtm_binning_sv(dtm, breaks, spl_val) 277 dtm = dtm_binsv_list['dtm'] 278 binning_sv = dtm_binsv_list['binning_sv']

C:\ProgramData\Anaconda3\lib\site-packages\scorecardpy\woebin.py in dtm_binning_sv(dtm, breaks, spl_val) 113 # sv_df = sv_df.assign(value = lambda x: x.value.astype(dtm['value'].dtypes)) 114 # dtm_sv & dtm --> 115 dtm_sv = pd.merge(dtm.fillna("missing"), sv_df[['value']].fillna("missing"), how='inner', on='value', right_index=True) 116 dtm = dtm[~dtm.index.isin(dtm_sv.index)].reset_index() if len(dtm_sv.index) < len(dtm.index) else None 117 # dtm_sv = dtm.query('value in {}'.format(sv_df['value'].tolist()))

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in merge(left, right, how, on, left_on, right_on, left_index, right_index, sort, suffixes, copy, indicator, validate) 72 validate=None, 73 ) -> "DataFrame": ---> 74 op = _MergeOperation( 75 left, 76 right,

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in init(self, left, right, how, on, left_on, right_on, axis, left_index, right_index, sort, suffixes, copy, indicator, validate) 648 warnings.warn(msg, UserWarning) 649 --> 650 self._validate_specification() 651 652 cross_col = None

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\reshape\merge.py in _validate_specification(self) 1301 ) 1302 if self.left_index or self.right_index: -> 1303 raise MergeError( 1304 'Can only pass argument "on" OR "left_index" ' 1305 'and "right_index", not a combination of both.'

MergeError: Can only pass argument "on" OR "left_index" and "right_index", not a combination of both.

image

kendalvictor avatar Jan 13 '21 03:01 kendalvictor

@kendalvictor I think you have to downgrade Pandas at 0.25.0. But, before you downgrade, in Pandas merge() method either indicate on argument or only left_index and right_index not both of them. Here, you try to merge using column value as well as merge on index simultaneously. I hope this helps

Okroshiashvili avatar Jan 13 '21 17:01 Okroshiashvili

Hi @Okroshiashvili the solution was to lower the version of pandas to 1.1.3, but ideally, this error should be taken into consideration for a version of this library since currently its "woebin" function does not work in version 1.2.0 of pandas

kendalvictor avatar Jan 13 '21 23:01 kendalvictor

I think it's not surprising to have version incompatibility. I hope maintainers will solve this problem but until then if your problem is solved, please close this issue :)

Okroshiashvili avatar Jan 14 '21 06:01 Okroshiashvili

Solved after pandas library version change from 1.2.0 to 1.1.3

kendalvictor avatar Jan 14 '21 12:01 kendalvictor

The bug should be fixed. Please check the latest version on the Github.

ShichenXie avatar Mar 15 '21 13:03 ShichenXie

but the problem still till now. image

chenz1hao avatar Nov 11 '21 07:11 chenz1hao

I am still having problem while using with pandas 1.3.4, do we have any new work around?

FairmoneyKunal avatar Jun 10 '22 04:06 FairmoneyKunal

Please install the latest version on GitHub and try again. It should be fixed.

ShichenXie avatar Jun 10 '22 08:06 ShichenXie

I have the same problem with pandas 1.5.3. 2023-02-13_00h01_33

VladOnMyOwn avatar Feb 12 '23 21:02 VladOnMyOwn