scorecardpy
scorecardpy copied to clipboard
Woebin function fails on MS SQL server
Trying to run on a MS SQL server instance using the german credit dataset. Error as below:
multiprocessing.pool.RemoteTraceback: """
Msg 39019, Level 16, State 2, Line 137 An external script error occurred: Traceback (most recent call last): File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\multiprocessing\pool.py", line 119, in worker result = (True, func(*args, **kwds)) File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\multiprocessing\pool.py", line 47, in starmapstar return list(itertools.starmap(args[0], args[1])) File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\scorecardpy\woebin.py", line 702, in woebin2 stop_limit=stop_limit, max_num_bin=max_num_bin, breaks=breaks, spl_val=spl_val) File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\scorecardpy\woebin.py", line 473, in woebin2_tree bin_list = woebin2_init_bin(dtm, min_perc_fine_bin=min_perc_fine_bin, breaks=breaks, spl_val=spl_val)
Msg 39019, Level 16, State 2, Line 137
An external script error occurred:
File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\scorecardpy\woebin.py", line 292, in woebin2_init_bin
brk = list(filter(lambda x: x>np.nanmin(xvalue) and x<np.nanmax(xvalue), brk))
File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\scorecardpy\woebin.py", line 292, in
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "
Msg 39019, Level 16, State 2, Line 137 An external script error occurred: File "C:\PROGRA~1\MICROS~3\MSSQL1~1.MSS\MSSQL\EXTENS~1\MSSQLSERVER01\85A94319-2EE5-4271-A733-EFE1547B54C9\sqlindb.py", line 215, in transform bins = sc.woebin(train, y="risk") File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\site-packages\scorecardpy\woebin.py", line 893, in woebin bins = dict(zip(xs, pool.starmap(woebin2, args))) File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\multiprocessing\pool.py", line 268, in starmap return self._map_async(func, iterable, starmapstar, chunksize).get() File "C:\Program Files\Microsoft SQL Server\MSSQL14.MSSQLSERVER\PYTHON_SERVICES\lib\multiprocessing\pool.py", line 608, in get raise self._value ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Line of code that fails: bins = sc.woebin(app_train_imp, y="risk") app_train_imp is a pandas dataframe having the column named 'risk'
Python version being used on the server - 3.5.2 scorecardPy version - 0.1.7 Pandas version - 0.24.2 MS SQL Server version - 2017
Try to install the latest version of scorecardpy on github.
Problem doesnt exist with the latest version on Github. Thanks! Closing the issue
Please ignore my previous comment. Issue still persists with the latest version of the code on Github. Same error as before. Reopening the issue as I had closed it
Have you got a chance to look at this? Any further details required from my end?
please provide a reproducible example, otherwise I don’t known how to fix it.