pymatch icon indicating copy to clipboard operation
pymatch copied to clipboard

Error: Perfect separation detected, results not available

Open wiekern opened this issue 6 years ago • 3 comments

Hi, I met an error described in the title when invoking fit_scores(). My data structrue is below image

and I draw samples 2000 for test, 20000 for control for fitting the matcher, but I have no clue why this error occurs (I have looked into the source code). In addition, I ran the example code for loan.csv successfully, so I wonder if the fields of the data should not be string, rather integer? In fact, the data structure of loan example contains string as well see below image

Hope anyone can help, thanks!

wiekern avatar Nov 08 '19 13:11 wiekern

@wiekern Not sure if it helps you, but I had similar errors and was pretty stuck. After some basic data analysis, I realized I had a few input variables with very limited distribution across groups (ex. Binary age bin with 10,000 rows = 0, and 5 rows = 1). After removing these variables/features, I had no errors.

Again, not sure if that's applicable to you, but was my (embarrassing ) issue.

mark-mediware avatar Jan 13 '20 19:01 mark-mediware

Thanks for your answer! The distribution might not be the problem, that was my view. I am wondering if the regression model supports input with string like in my case column of "text". I am think of I must be convert text into a numeric value or word embeddings (vector).

wiekern avatar Jan 14 '20 10:01 wiekern

model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit() Traceback (most recent call last):

File "", line 1, in model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit()

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 1963, in fit bnryfit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 227, in fit mlefit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\model.py", line 519, in fit xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 215, in _fit xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 327, in _fit_newton callback(newparams)

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 211, in _check_perfect_pred raise PerfectSeparationError(msg)

PerfectSeparationError: Perfect separation detected, results not available

model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit() Traceback (most recent call last):

File "", line 1, in model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit()

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 1963, in fit bnryfit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 227, in fit mlefit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\model.py", line 519, in fit xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 215, in _fit xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 327, in _fit_newton callback(newparams)

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 211, in _check_perfect_pred raise PerfectSeparationError(msg)

PerfectSeparationError: Perfect separation detected, results not available

umangdadhaniya avatar Jun 29 '21 18:06 umangdadhaniya