tabnet icon indicating copy to clipboard operation
tabnet copied to clipboard

TypeError when saving a model with `numpy.bool_` types

Open nishaq503 opened this issue 2 years ago • 11 comments

numpy.bool_ types are not being correctly serialized to json.

What is the current behavior? The ComplexEncoder class (here) does not handle numpy.bool_ which is not JSON serializable. This raises a TypeError when saving certain models.

If the current behavior is a bug, please provide the steps to reproduce.

model = TabNetClassifier(...)
model.fit(...)  # training data and model parameters contain values of type numpy.bool_
model.save_model('path/to/model')

Expected behavior numpy.bool_ should be cast to python's bool before being serialized to JSON. Here is my suggested fix. Please let me know if this is acceptable for a PR:

class ComplexEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, np.int64):
            return int(obj)
        if isinstance(obj, np.bool_):
            return bool(obj)
        # Let the base class default method raise the TypeError
        return json.JSONEncoder.default(self, obj)

Other relevant information: poetry version: "poetry-core>=1.0.0" python version: "^3.9" Operating System: "Linux Kernel 5.18.14-arch1-1" Additional tools: CUDA Version: 11.7 Driver Version: 515.57

Additional context

Here's a stacktrace:

  File ".venv/lib/python3.10/site-packages/pytorch_tabnet/abstract_model.py", line 375, in save_model
    json.dump(saved_params, f, cls=ComplexEncoder)
  File "/usr/lib/python3.10/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib/python3.10/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib/python3.10/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File ".venv/lib/python3.10/site-packages/pytorch_tabnet/utils.py", line 339, in default
    return json.JSONEncoder.default(self, obj)
  File "/usr/lib/python3.10/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type bool_ is not JSON serializable

I ran into this when trying tabnet in a kaggle competition. If you need to, you can look here in my code where the error happens.

nishaq503 avatar Aug 01 '22 16:08 nishaq503

Thanks for your contribution.

Could you please explain to me when does the error occur and does not ? I don't understand how there can be sometimes a numpy.bool_ and sometimes a python bool ?

Training data won't change the model weights or architecture so when does this occur ?

Optimox avatar Aug 01 '22 18:08 Optimox

Thanks for the quick response. I hope the following answers your questions. I am happy to give more clarification and answer more questions that you might have.

When I save a model, when the model_params.json file is being written (here), it seems that some model parameters in the saved_params dictionary are of type numpy.bool_. numpy.bool_ cannot be serialized for JSON.

I know that python's bool can be serialized so I tried adding that extra if statement in the ComplexEncoder class. It worked and so I suggested it as a fix.

I don't know how, or even if, the training data can cause model parameters to take up the numpy.bool_ type. The original data were a combination of string, floating-point, integer, boolean, and categorical types. I preprocessed and encoded all training data to be of numpy.float16 dtype before feeding it to the model. The choice of numpy.float16 was mostly for memory concerns as the data are quire large.

nishaq503 avatar Aug 02 '22 18:08 nishaq503

Ok thank you I'll look into it! @eduardocarvp any idea on when this could happen ?

Optimox avatar Aug 02 '22 21:08 Optimox

I had a quick look, but I don't know where this can be coming from... I don't see how training data could change the weights either. Have you made any changes to the model/architecture at all?

Anyway, I agree with the fix, but would be good to know why it's happening. I will have a deeper look later.

eduardocarvp avatar Aug 03 '22 12:08 eduardocarvp

I didn't change any of the internal structure of the model. If it helps, here is the set of input parameters I used:

model = TabNetClassifier(
    n_d=32,
    n_a=32,
    n_steps=3,
    gamma=1.3,
    n_independent=2,
    n_shared=2,
    momentum=0.02,
    lambda_sparse=1e-3,
    optimizer_fn=torch.optim.Adam,
    optimizer_params=dict(lr=1e-3, weight_decay=1e-3),
    scheduler_fn=torch.optim.lr_scheduler.CosineAnnealingWarmRestarts,
    scheduler_params={
        'T_0': 5,
        'eta_min': 1e-4,
        'T_mult': 1,
        'last_epoch': -1,
    },
    mask_type='entmax',
    seed=cfg.seed,
)

model.fit(
    numpy.array(train_x),
    numpy.array(train_y.values.ravel()),
    eval_set=[(numpy.array(valid_x), numpy.array(valid_y.values.ravel()))],
    max_epochs=128,
    patience=10,
    batch_size=1024,
    eval_metric=['auc', 'accuracy', AmexMetric],
    from_unsupervised=unsupervised_model,
)

AmexMetric is a custom metric that, partially, relies on computing an AUC-ROC score. I added the unsupervised_model after making that change to save the model.

nishaq503 avatar Aug 03 '22 15:08 nishaq503

@nishaq503,

Is there any chance that you share a Kaggle notebook that reproduces your error?

How come that this notebook https://www.kaggle.com/code/medali1992/amex-tabnetclassifier-feature-eng-0-791 seems to be working just fine ?

Optimox avatar Aug 10 '22 14:08 Optimox

You can use torch to save and load model! import torch torch.save(clf_model, "./model_1")

To load

clf_model = torch.load("./model_1")

damvantai avatar Aug 18 '22 12:08 damvantai

@damvantai no it's better to use the built in method and do clf.save("your/path")

Optimox avatar Aug 18 '22 13:08 Optimox

I have the same problem with an int8:

Object of type int8 is not JSON serializable

It's also raised from the ComplexEncoder. It seems to come from {'preds_mapper': {'0': 0, '1': 1}} where the values 0 and 1 have the type np.int8 (apparently because my target variable is an int8 like the OP seems to use a bool for their target).

So as a workaround for the time being one could cast the target variable to np.int64 which seems to be the only np.intX ComplexEncoder can encode right now.

andreas-wolf avatar Aug 22 '22 23:08 andreas-wolf

thx @andreas-wolf, does this happen in AMEX Competition as well ? What is the environment your are using ?

Optimox avatar Aug 23 '22 09:08 Optimox

@Optimox Hi. I don't know if that happens in the AMEX competition, but I guess so, since the json encoding is not working for dtypes other than np.int64.

Sorry for not being clear enough in my description of the problem. I've attached therefor a minimal working example to trigger the bug.

As said the problem is that y_train aka the target variable is of type bool (or np.int8 in my case) and you're only handling np.int64 in ComplexEncoder https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L338

https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L336-L341

  import os
  import wget
  import pandas as pd
  import numpy as np
  from pathlib import Path
  from pytorch_tabnet.tab_model import TabNetClassifier
  url = "https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
  dataset_name = 'census-income'
  out = Path(os.getcwd()+'/data/'+dataset_name+'.csv')
  out.parent.mkdir(parents=True, exist_ok=True)
  if out.exists():
      print("File already exists.")
  else:
      print("Downloading file...")
      wget.download(url, out.as_posix())
  features = ['39', ' 77516', ' 13']
  train = pd.read_csv(out)
  train = train[features + [' <=50K']]
  train['target'] = train[' <=50K'] == '<=50K'
  train = train.drop(columns=[' <=50K'])
  if "Set" not in train.columns:
      train["Set"] = np.random.choice(["train", "valid", "test"], p =[.8, .1, .1], size=(train.shape[0],))
  
  train_indices = train[train.Set=="train"].index
  valid_indices = train[train.Set=="valid"].index
  test_indices = train[train.Set=="test"].index
  
  X_train = train[features].values[train_indices]
  y_train = train['target'].values[train_indices]
  
  X_valid = train[features].values[valid_indices]
  y_valid = train['target'].values[valid_indices]
  
  X_test = train[features].values[test_indices]
  y_test = train['target'].values[test_indices]
  
  clf = TabNetClassifier()
  clf.fit(X_train=X_train, y_train=y_train,max_epochs=2)
  
  saving_path_name = "./tabnet_model_test_1"
  saved_filepath = clf.save_model(saving_path_name)

andreas-wolf avatar Aug 23 '22 13:08 andreas-wolf

I have the same problem with an int8:

Object of type int8 is not JSON serializable

It's also raised from the ComplexEncoder. It seems to come from {'preds_mapper': {'0': 0, '1': 1}} where the values 0 and 1 have the type np.int8 (apparently because my target variable is an int8 like the OP seems to use a bool for their target).

So as a workaround for the time being one could cast the target variable to np.int64 which seems to be the only np.intX ComplexEncoder can encode right now.

https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L336-L341

How about replacing line 338~339 by

         if isinstance(obj, (np.generic, np.ndarray)): 
             return obj.tolist()

It seems that only TabNetClassifier object has this problem. The type of {'preds_mapper': {'0': 0, '1': 1}} values are given by user when user call TabNetClassifier.fit. Using numpy method tolist() can solve all similar problems not only for np.bool_ but also np.int32 or other numpy generic types. On the other way, it maybe better to convert train_labels value to JSON compatible types before assign to preds_mapper.

https://github.com/dreamquark-ai/tabnet/blob/cab643b156fdecfded51d70d29072fc43f397bbb/pytorch_tabnet/tab_model.py#L45-L64

ShihHsuanChen avatar Nov 30 '22 09:11 ShihHsuanChen

I have the same problem with an int8:

Object of type int8 is not JSON serializable

It's also raised from the ComplexEncoder. It seems to come from {'preds_mapper': {'0': 0, '1': 1}} where the values 0 and 1 have the type np.int8 (apparently because my target variable is an int8 like the OP seems to use a bool for their target). So as a workaround for the time being one could cast the target variable to np.int64 which seems to be the only np.intX ComplexEncoder can encode right now.

https://github.com/dreamquark-ai/tabnet/blob/5ac55834b32693abc4b22028a74475ee0440c2a5/pytorch_tabnet/utils.py#L336-L341

How about replacing line 338~339 by

         if isinstance(obj, (np.generic, np.ndarray)): 
             return obj.tolist()

It seems that only TabNetClassifier object has this problem. The type of {'preds_mapper': {'0': 0, '1': 1}} values are given by user when user call TabNetClassifier.fit. Using numpy method tolist() can solve all similar problems not only for np.bool_ but also np.int32 or other numpy generic types. On the other way, it maybe better to convert train_labels value to JSON compatible types before assign to preds_mapper.

https://github.com/dreamquark-ai/tabnet/blob/cab643b156fdecfded51d70d29072fc43f397bbb/pytorch_tabnet/tab_model.py#L45-L64

I had the same problem with uint8. I followed @ShihHsuanChen's tip, changed lines 338~339, as he mentioned, and it worked for me.

rafamarquesi avatar Dec 16 '22 20:12 rafamarquesi

thanks I'll fix this soon

Optimox avatar Dec 17 '22 21:12 Optimox

when will this be solved :| ? @Optimox any timeline . Also any workaround for this

gauravbrills avatar Jan 23 '23 14:01 gauravbrills

I don't have a timeline to share. I think making sure during training that the targets columns has type int instead of np.int should solve the problem, I never had this problem to be honest.

Optimox avatar Jan 23 '23 15:01 Optimox

Ahh I did try let that [I think I did do that based on the discussion thread] .. yes we do have some conversions in between . For now had tried to do a joblib dump as a workaround .

gauravbrills avatar Jan 23 '23 15:01 gauravbrills

Thanks @Optimox the above comment solved my issue re converted the types I was shrinking to save data for the labels .

gauravbrills avatar Jan 23 '23 16:01 gauravbrills