[BUG] `binarization_type` not defaulting to "auto"
Describe the bug
Using a config for a LightGBM txt model file but not explicitly setting binarization_type not "auto" fails to load because it attempts to use the PickleBinarizer rather than the LightGBMBinarizer.
To Reproduce
Steps to reproduce the behavior:
- Run with a config set to use lightgbm-binary_cls_model.txt and do not set any config for
binarization_type - See error
mlup.errors.ModelBinarizationError: Error with deserialize model: could not find MARK. Error will show that it is usingml/binarization/pickle.pynotml/binarization/lightgbm.py - Try again with an explicit config of
binarization_type: autoand it will work
Expected behavior
It should auto-detect that it is lightgbm and choose the LightGBMBinarizer.
Environment (please complete the following information):
- Python version 3.12.7
- Version PyMLup 0.2.2
- LightGBM 4.5.0
Thanks for this issue. I need to check this bug soon 👀
@neverfox I couldn't repeat your error scenario. I think I'm doing something wrong :smile:
I ask you to clarify how I can repeat your script.
I've tried two ways:
- bash command
mlup run -c bug-lightgmb-conf.yaml. - From python code.
Bash
To test the script, I used the following configuration file:
version: '1'
ml:
auto_detect_predict_params: true
storage_kwargs:
files_mask: '(\w.-_)*.txt'
path_to_files: models/lightgbm-binary_cls_model.txt
storage_type: mlup.ml.storage.local_disk.DiskStorage
And also, with the addition of binarization_type: auto:
version: '1'
ml:
auto_detect_predict_params: true
binarization_type: auto
storage_kwargs:
files_mask: '(\w.-_)*.txt'
path_to_files: models/lightgbm-binary_cls_model.txt
storage_type: mlup.ml.storage.local_disk.DiskStorage
I used the following command: mlup run -c ./bug-lightgbm-conf.yaml.
The search worked out as expected. It was the LIGHTGBM binarizer that was launched.
Attaching logs:
Python code
To check, I used the following script:
from mlup import up
from mlup.constants import StorageType
_up = up.UP(
conf=up.Config(
storage_type=StorageType.disk,
storage_kwargs={
'path_to_files': "models/lightgbm-binary_cls_model.txt",
'files_mask': r"(\w.-_)*.txt"
},
)
)
_up.ml.load()
print("There are no errors")
_up_from_conf = up.UP.load_from_yaml(
conf_path="bug-lightgbm-conf.yaml",
load_model=True
)
print("There are no errors after load from yaml")
From the python interpreter, the search for the binarizer worked out as expected.
Attaching logs:
Environment
- Python version 3.12.4
- Version PyMLup 0.2.2
- LightGBM 4.5.0