cfkstat

Results 12 issues of cfkstat

Woe分箱,一些特殊值,比如比率类变量分母为0的特殊情况,如何单独设置分箱子。

How to develop a scorecard that uses lasso or ridge for variable screening to get a model that is more generalizable than a model with a full subset of variables?

enhancement

示例中的reward_metric的函数,好像只传了训练集的预测值和实际值?

HyperGBM[notebook]在Jypter不显示可视化

how to set a fixed test data, to eval model?

documentation

Maximize the AUC Score of the model training set and validation set, while ensuring that the difference between the two AUCs is less than 0.02, or the difference between KS...

enhancement

Using glum and joblib with ray, I ran multiple models and found that threads could use 1 core, and if I set n_jobs=1, I could only use 50% of all...

The predicted result of PMML is different from the structure given by the package. The value accuracy of node nodes of each tree is more different than that of PMML....

风险评分卡的开发,我们通常需要找到一个Logistic回归模型满足如下条件认为是最优的(给定变量入模数量,限制条件最优): 1. 给定一个训练集(train)和验证集(test),训练和验证是不同时点的贷款数据最终的风险表现(客户是否逾期)。 2. Score1 = AUC_train - if(abs(AUC_train-AUC_test) >= 0.015, abs(AUC_train-AUC_test), 0.5*abs(AUC_train-AUC_test)) 3. Score2 = KS_train- if(abs(KS_train-KS_test) >= 0.03, abs(KS_train-KS_test), 0.5*abs(KS_train-KS_test)) 评分1和评分2都可以作为一个评价函数,这里要test测试集上不参与模型训练的,所以交叉验证是不能用的,test只能用来评价模型函数。