pytorch_tabular
pytorch_tabular copied to clipboard
category encoder fails when there is a value in valid which was not present in train
I have some sparse boolean columns with very few True
s, and when it happens that there is not True in train
but there are some in validation
, the category encoder replaces the True
s with nans. Took me a while to figure out the source of the error, the trace is this:
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pytorch_tabular/tabular_model.py", line 754, in fit
datamodule = self.prepare_dataloader(
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pytorch_tabular/tabular_model.py", line 537, in prepare_dataloader
datamodule.setup("fit")
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pytorch_tabular/tabular_datamodule.py", line 510, in setup
self._cache_dataset()
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pytorch_tabular/tabular_datamodule.py", line 456, in _cache_dataset
validation_dataset = TabularDataset(
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pytorch_tabular/tabular_datamodule.py", line 78, in __init__
self.categorical_X = self.categorical_X.astype(np.int64).values
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/generic.py", line 6534, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 414, in astype
return self.apply(
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 354, in apply
applied = getattr(b, f)(**kwargs)
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 616, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 238, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 183, in astype_array
values = _astype_nansafe(values, dtype, copy=copy)
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 101, in _astype_nansafe
return _astype_float_to_int_nansafe(arr, dtype, copy)
File "/home/scripts/mbrl-tools/.venv/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 146, in _astype_float_to_int_nansafe
raise IntCastingNaNError(
pandas.errors.IntCastingNaNError: Cannot convert non-finite values (NA or inf) to integer
This shoudn't be the case. the category encoder is supposed to be robust enough to catch this. Let me take a look at this. #406 Also seems to be related to the same issue
@kegl Any ways you can share a reproducible and self-contained minimal example?
https://stackoverflow.com/help/minimal-reproducible-example
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.