pymlup icon indicating copy to clipboard operation
pymlup copied to clipboard

[BUG] `binarization_type` not defaulting to "auto"

Open neverfox opened this issue 1 year ago • 2 comments

Describe the bug

Using a config for a LightGBM txt model file but not explicitly setting binarization_type not "auto" fails to load because it attempts to use the PickleBinarizer rather than the LightGBMBinarizer.

To Reproduce

Steps to reproduce the behavior:

  1. Run with a config set to use lightgbm-binary_cls_model.txt and do not set any config for binarization_type
  2. See error mlup.errors.ModelBinarizationError: Error with deserialize model: could not find MARK. Error will show that it is using ml/binarization/pickle.py not ml/binarization/lightgbm.py
  3. Try again with an explicit config of binarization_type: auto and it will work

Expected behavior

It should auto-detect that it is lightgbm and choose the LightGBMBinarizer.

Environment (please complete the following information):

  • Python version 3.12.7
  • Version PyMLup 0.2.2
  • LightGBM 4.5.0

neverfox avatar Oct 24 '24 15:10 neverfox

Thanks for this issue. I need to check this bug soon 👀

nxexox avatar Oct 24 '24 17:10 nxexox

@neverfox I couldn't repeat your error scenario. I think I'm doing something wrong :smile:

I ask you to clarify how I can repeat your script.

I've tried two ways:

  • bash command mlup run -c bug-lightgmb-conf.yaml.
  • From python code.

Bash

To test the script, I used the following configuration file:

version: '1'
ml:
  auto_detect_predict_params: true
  storage_kwargs:
    files_mask: '(\w.-_)*.txt'
    path_to_files: models/lightgbm-binary_cls_model.txt
  storage_type: mlup.ml.storage.local_disk.DiskStorage

And also, with the addition of binarization_type: auto:

version: '1'
ml:
  auto_detect_predict_params: true
  binarization_type: auto
  storage_kwargs:
    files_mask: '(\w.-_)*.txt'
    path_to_files: models/lightgbm-binary_cls_model.txt
  storage_type: mlup.ml.storage.local_disk.DiskStorage

I used the following command: mlup run -c ./bug-lightgbm-conf.yaml.

The search worked out as expected. It was the LIGHTGBM binarizer that was launched.

Attaching logs:

image

Python code

To check, I used the following script:

from mlup import up
from mlup.constants import StorageType


_up = up.UP(
    conf=up.Config(
        storage_type=StorageType.disk,
        storage_kwargs={
            'path_to_files': "models/lightgbm-binary_cls_model.txt",
            'files_mask': r"(\w.-_)*.txt"
        },
    )
)
_up.ml.load()

print("There are no errors")

_up_from_conf = up.UP.load_from_yaml(
    conf_path="bug-lightgbm-conf.yaml",
    load_model=True
)

print("There are no errors after load from yaml")

From the python interpreter, the search for the binarizer worked out as expected.

Attaching logs:

image

Environment

  • Python version 3.12.4
  • Version PyMLup 0.2.2
  • LightGBM 4.5.0

nxexox avatar Nov 06 '24 13:11 nxexox