etna icon indicating copy to clipboard operation
etna copied to clipboard

Create notebook about feature selection

Open alex-hse-repository opened this issue 2 years ago • 2 comments

🚀 Feature Request

Create notebook demonstrating our method for feature selection

Motivation

Show our feature selection methods to the users

Proposal

  1. Create notebook with the short description and demonstration of out feature selection transforms(TreeFeatureSelectionTransform, GaleShapleyFeatureSelectionTransform, MRMRFeatureSelectionTransform)
  2. Include also plot_feature_relevance method here

Test cases

No response

Alternatives

No response

Additional context

No response

Checklist

  • [ ] I discussed this issue with ETNA Team

alex-hse-repository avatar May 30 '22 13:05 alex-hse-repository

sudo py-spy record -o speedscope.json -f speedscope python f.py --rate 50 --nonblocking


# %% [markdown]
# # Feature selection
# 
# This notebook contains the simple examples of using feature extractor transforms with ETNA library.
# 
# ### Navigation
# 
# - [Intro](#20-intro-to-feature-selection)
# - [TreeFeatureSelectionTransform](#21-tree)
# - [GaleShapleyFeatureSelectionTransform](#21-galeshapleyfeatureselectiontransform)
# - [MRMRFeatureSelectionTransform](#22-mrmrfeatureselectiontransform)
# 

# %%
import warnings

warnings.filterwarnings("ignore")

# %% [markdown]
# ## 1. Load Dataset
# 
# We are going to work with the time series from Tabular Playground Series - Jan 2022. The dataset contains daily merchandise sales – mugs, hats, and stickers – at two imaginary store chains across three Scandinavian countries. As exogenous data, we will use Finland, Norway, and Sweden Weather Data 2015-2019 dataset containing daily country average precipitation, snow depth and air temperature data.

# %%
import pandas as pd
import warnings

warnings.filterwarnings("ignore")

df = pd.read_csv("examples/data/nordic_merch_sales.csv")

# %%
from etna.datasets import TSDataset

df = TSDataset.to_dataset(df)
ts = TSDataset(df, freq="D")
ts.plot(4)

# %%
HORIZON = 60

# %% [markdown]
# ## 2. Feature selection methods
# 
# ### 2.0 Intro to feature selection
# 
# Let's create features and build pipeline with dataset:

# %%
from etna.pipeline import Pipeline
from etna.models import CatBoostModelPerSegment
from etna.transforms import (
    DateFlagsTransform,
    MeanTransform,
    LagTransform,
    TrendTransform,
    FourierTransform,
    HolidayTransform,
)
from etna.metrics import SMAPE

transforms = [
    TrendTransform(in_column="target", out_column="trend"),
    LagTransform(in_column="target", lags=range(HORIZON, 100), out_column="target_lag"),
    DateFlagsTransform(
        day_number_in_month=True, day_number_in_week=False, is_weekend=False, out_column="datetime_flag"
    ),
    MeanTransform(in_column=f"target_lag_{HORIZON}", window=12, seasonality=7, out_column="mean_transform"),
    FourierTransform(period=250, order=6, out_column="fourier"),
    HolidayTransform(iso_code="SWE", out_column="SWE_holidays"),
    HolidayTransform(iso_code="NOR", out_column="NOR_holidays"),
    HolidayTransform(iso_code="FIN", out_column="FIN_holidays"),
]

# %% [markdown]
# With this simple transform we improved SMAPE and backtest time in more than twice.
# 
# ETNA also provides methods to plot importance of each feature:

# %%
from etna.transforms import GaleShapleyFeatureSelectionTransform
from etna.analysis.feature_relevance import StatisticsRelevanceTable

rt = StatisticsRelevanceTable()
feature_selector_transform = GaleShapleyFeatureSelectionTransform(top_k=20, relevance_table=rt, return_features=True)


pipeline = Pipeline(
    model=CatBoostModelPerSegment(), transforms=transforms + [feature_selector_transform], horizon=HORIZON
)
metrics_galeshapley_feature_selector, forecast_galeshapley_feature_selector, _ = pipeline.backtest(
    ts=ts, metrics=[SMAPE()], n_folds=1
)

martins0n avatar Sep 14 '22 11:09 martins0n

speedcope.json image

Maybe we can pass all columns to mann whitney test. current implementation compares features separately.

martins0n avatar Sep 14 '22 12:09 martins0n

Waiting for #886.

Mr-Geekman avatar Jun 06 '23 09:06 Mr-Geekman

Closed by #875.

Mr-Geekman avatar Jul 12 '23 06:07 Mr-Geekman