tsfresh
tsfresh copied to clipboard
extract_features may produce duplicated columns
The problem:
When I tried to apply extract_features
, it produced duplicated columns.
My script is like below,
extract_features(timeseries, column_id="id", column_sort="time")
(I am sorry not to show the data I used because of confidentiality)
In my case, output df had two duplicated __value_count__value_1
colums.
Duplicated features can cause InvalidIndexError
when applying tsfresh.utilities.dataframe_functions.impute
.
I think it is better to be fixed.
Anything else we need to know?:
Environment:
- Python version: 3.7.4
- Operating System: ubuntu 16.04.6 LTS
- tsfresh version: 0.18.0
- Install method (conda, pip, source): pip
Hi @nagiton Thanks for your bug report. Unfortunately, I can not reproduce the issue. I tried with the example data:
from tsfresh import extract_features
from tsfresh.examples import load_robot_execution_failures
df, _ = load_robot_execution_failures()
df_extracted = extract_features(df, column_id="id", column_sort="time")
# Check for duplicated columns
assert len(list(df_extracted.columns)) == len(set(df_extracted.columns))
I assume that the bug is independent of the actual data, so maybe you could try to reproduce your bug with only some small amount of test data? Maybe make sure to use the same column names as your original data.
Extracting features from a df returns a feature_df with duplicate column values with different column names. E.g. 'temp__maximum' & 'temp__absolute__maximum' Is it ok to drop these duplicate columns?