etna
etna copied to clipboard
[BUG] Segmentation fault on catboost model during forecasting with prediction intervals
🐛 Bug Report
If you make a forecast with prediction intervals using catboost model the segmentation fault can occur.
The error looks like:
Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
Expected behavior
No errors.
How To Reproduce
import numpy as np
import pandas as pd
from etna.models import CatBoostMultiSegmentModel
from etna.transforms import LagTransform, DateFlagsTransform
from etna.datasets import TSDataset
from etna.pipeline import Pipeline
def get_ts() -> TSDataset:
rng = np.random.default_rng(0)
periods = 100
df1 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
df1["segment"] = "segment_1"
df1["target"] = rng.uniform(10, 20, size=periods)
df2 = pd.DataFrame({"timestamp": pd.date_range("2020-01-01", periods=periods)})
df2["segment"] = "segment_2"
df2["target"] = rng.uniform(-15, 5, size=periods)
df = pd.concat([df1, df2]).reset_index(drop=True)
df = TSDataset.to_dataset(df)
tsds = TSDataset(df, freq="D")
return tsds
def main():
ts = get_ts()
model = CatBoostMultiSegmentModel(iterations=100)
transforms = [DateFlagsTransform(), LagTransform(in_column="target", lags=list(range(3, 10)))]
pipeline = Pipeline(model=model, transforms=transforms, horizon=3)
pipeline.fit(ts)
pipeline.forecast(prediction_interval=True)
if __name__ == "__main__":
main()
Observations:
- Problem happens inside backtest in
_forecast_backtest_pipeline
method on second pipeline - If you call
backtest
instead offorecast
error doesn't happen - If you rewrite
_run_all_folds
without parallel execution using list-comprehension, the error remains - Running catboost in
logging_level="Debug"
doesn't clear up the situation - If you run
pipeline.forecast(prediction_interval=True, num_folds=5)
, the error happens on fold 3 - If you run
pipeline.forecast(prediction_interval=True, num_folds=8)
, the error happens on fold 1 - Changing
random_seed
doesn't change the fold on which pipeline fails - Removing operations from
tslogger
from_forecast_backtest_pipeline
doesn't change the error - Removing
DateFlagsTransform
fromtransforms
stops the error- It can give a clue that problem can be with categoricals
- Removing
LagTransform
fromtransform
doesn't stop the error - Setting
thread_count=1
doesn't stop the error
I haven't succeeded to reproduce the problem on installation from the scratch, so it isn't really obvious what leads to the problem.
Environment
No response
Additional context
No response
Checklist
- [X] Bug appears at the latest library version