autogluon icon indicating copy to clipboard operation
autogluon copied to clipboard

Getting ZeroDivisionError while fitting the Timeseries

Open AhangarAamir opened this issue 1 year ago • 7 comments

Getting ZeroDivisionError while fitting the Timeseries Expected behavior Should Run Successfully To Reproduce

  1. Download the dataset
  2. create new column df["id_col"] = "default"
  3. Set DataValue as target
  4. Set YearEnd as timestamp
  5. Set id_column as id_col
  6. Give frequency as Y
  7. Run

Logs Traceback (most recent call last): File "autogluon/timeseries/autogluon_timeseries_trainer.py", line 406, in train_model self.model.fit(training_df, time_limit=trainingHyperparameters['time_limit'], presets="medium_quality", File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/core/utils/decorators.py", line 31, in _call return f(*gargs, **gkwargs) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/predictor.py", line 742, in fit self._learner.fit( File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/learner.py", line 64, in fit return self._fit( File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/learner.py", line 87, in _fit train_data = self.feature_generator.fit_transform(train_data, data_frame_name="train_data") File "envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/utils/features.py", line 280, in fit_transform self.fit(data) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/utils/features.py", line 169, in fit past_covariates_df = self.past_covariates_pipeline.fit_transform(data[self.past_covariates_names]) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/timeseries/utils/features.py", line 106, in fit_transform transformed = self._convert_numerical_columns_to_float(super().fit_transform(X, *args, **kwargs)) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/features/generators/pipeline.py", line 69, in fit_transform self._compute_post_memory_usage(X_out) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/features/generators/pipeline.py", line 130, in _compute_post_memory_usage self.post_memory_usage = get_approximate_df_mem_usage(X, sample_ratio=0.2).sum() File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/common/utils/pandas_utils.py", line 23, in inner return func(*args, **kwargs) File "/envs/autogluonvenv/lib/python3.8/site-packages/autogluon/common/utils/pandas_utils.py", line 49, in get_approximate_df_mem_usage sample_ratio_cat = num_categories_sample / num_categories ZeroDivisionError: division by zero Installed Versions autogluon==1.1.0 pandas==2.0.3 python3.8

AhangarAamir avatar Jul 09 '24 06:07 AhangarAamir

@shchur sir please reply on this

AhangarAamir avatar Jul 15 '24 06:07 AhangarAamir

Hi @AhangarAamir, can you please provide the code that reproduces this issue as well as the logs generated by the predictor?

shchur avatar Jul 15 '24 14:07 shchur

I just encountered this error. I had a column id unique for each row using the TimeSeriesPredictor. It was fixed just dropping that column train = train.drop("id", axis=1).

jakobhuss avatar Jul 18 '24 11:07 jakobhuss

@jakobhuss ok good, but removing id is not the solution

AhangarAamir avatar Jul 19 '24 10:07 AhangarAamir

Hi @AhangarAamir, sorry for the slow response. I just had a look at the data and it seems that there is a problem with how the input TimeSeriesDataFrame is constructed in your example. When you set the id_col to the same value for all rows in the dataset, you are essentially telling AutoGluon that your data contains a single time series - even though the dataset contains many time series of different types. For the TimeSeriesPredictor to work correctly, you need to make sure that the value in the id_col is unique for each unique time series in your dataset.

I had a closer look at the data and it seems that there only few observations available in each time series (at most 8 observations). Here are the value counts for the YearEnd column:

YearEnd count
2013-01-01 88244
2014-01-01 74702
2012-01-01 33960
2010-01-01 22765
2011-01-01 18023
2008-01-01 108
2001-01-01 104
2007-01-01 55

In such case, when only limited data is available, I would recommend to the TabularPredictor from autogluon.tabular since time series forecasting models likely won't be able to effectively work with very short time series.

shchur avatar Jul 30 '24 11:07 shchur

@shchur Thank you for your reply , but TimeSeriesPredictor should not give ZeroDivisionError it should raise a proper exception with proper message

AhangarAamir avatar Aug 01 '24 05:08 AhangarAamir

Thank you, that's a good point. Can you please share more details on how to reproduce the problem? I tried running the following code but the predictor trained normally:

import pandas as pd
from autogluon.timeseries import TimeSeriesDataFrame, TimeSeriesPredictor
df = pd.read_csv("US_disease_cleaned.csv")
df["id"] = "default"
tsdf = TimeSeriesDataFrame(df, timestamp_column="YearEnd", id_column="id")
predictor = TimeSeriesPredictor(target="DataValue", freq="Y", eval_metric="RMSE").fit(tsdf)

shchur avatar Aug 01 '24 13:08 shchur

I've found an MWE for this issue

import pandas as pd
from autogluon.timeseries.utils.features import ContinuousAndCategoricalFeatureGenerator, TimeSeriesFeatureGenerator

df = pd.DataFrame(
    {"item_id": ["A", "B", "C"], "feat": float("nan")}
)
df["feat"] = df["feat"].astype("category")

static_feature_pipeline = ContinuousAndCategoricalFeatureGenerator(minimum_cat_count=1)
static_feature_pipeline.fit_transform(df)

or using the public API

from autogluon.timeseries import TimeSeriesPredictor, TimeSeriesDataFrame
data = TimeSeriesDataFrame("https://autogluon.s3.amazonaws.com/datasets/timeseries/m4_hourly_subset/train.csv")
data.static_features = pd.DataFrame({"item_id": data.item_ids, "feat": None}).astype({"feat": "category"})
predictor = TimeSeriesPredictor().fit(data, hyperparameters={"Naive": {}})

The problem seems to lie in the get_approximate_df_mem_usage method. We need to make sure that number of categories is >= 1 (current num_categories=0 is possible).

shchur avatar Apr 23 '25 08:04 shchur