fklearn
fklearn copied to clipboard
Is there a reason why the `object` in learner logs isn't inside the learner key?
Code sample
Taking a look at the return logs of the learners, e.g. the logistic regression one:
log = {'logistic_classification_learner': {
'features': features,
'target': target,
'parameters': merged_params,
'prediction_column': prediction_column,
'package': "sklearn",
'package_version': sk_version,
'feature_importance': dict(zip(features, clf.coef_.flatten())),
'training_samples': len(df)},
'object': clf}
Problem description
Is there a reason why the object key isn't inside the dictionary of logistic_classification_learner? This leads to a problem where, if I have multiple learners in my pipeline, the final object depends only on the order of the learners inside the pipeline, and I lose the objects of the first learners.
E.g.: My pipeline is (logistic_regression, isotonic_calibration). Since the build_pipeline function will merge the logs of the two objects, the final object will have only the isotonic calibration, and I lose the logistic_regression object.
Expected behavior
Access all learner objects of the pipeline, not just the last one.
Possible solutions
Put the learner object inside the dictionary of the logs:
log = {'logistic_classification_learner': {
'features': features,
'target': target,
'parameters': merged_params,
'prediction_column': prediction_column,
'package': "sklearn",
'package_version': sk_version,
'feature_importance': dict(zip(features, clf.coef_.flatten())),
'training_samples': len(df),
'object': clf}
}
I'll double check this, but seems that we have some typo. Looking at the code this "object" key should be dropped to avoid a huge training log, I'm saying this based on this line https://github.com/nubank/fklearn/blob/master/src/fklearn/training/pipeline.py#L75
If the was 'obj' instead of 'object', the key would be dropped in your learner's log, and will be available only in the key '__fkml__', under the learners key. But given that the name is object, nothing happens