etna
etna copied to clipboard
Fix order of columns after TSDataset.to_pandas() and after TSDataset.__init__()
🚀 Feature Request
It may be useful to impose the same order on both the return dataframe of TSDataset.to_dataset()
and the dataframe df
constructed during TSDataset.__init__()
as the order imposed on the return dataframe of TSDataset.to_flatten()
for the sake of consistency.
Current order of columns in both the return dataframe of TSDataset._to_dataset()
and TSDataset.df
places "target" along other features in alphabetical order, while order of columns in the return dataframe of TSDataset.to_flatten()
places "target" after "timestamp" and "segment" and prior to other features in alphabetical order.
The order after TSDataset.to_flatten()
makes observing "target" value more convenient (as it is not hidden among many other features) and emphasises its special role.
Proposal
I propose the following order of columns:
- timestamp,
- segment,
- target,
- other columns in alphabetical order.
How it can be done for TSDataset.to_dataset()
:
- Find line
df_copy = df_copy.pivot(index="timestamp", columns="segment")
inetna.datasets.tsdataset.py
- Prior to it reorder columns of df_copy in a way that puts "target" prior to other features, if said "target" is provided. It should look like
feature_columns.remove("target")
and in the next linedf_copy = df_copy[["timestamp, "segment", "target"] + feature_columns]
How it can be done for TSDataset.__init__()
:
- Find line
df = pd.concat((df, self.df_exog), axis=1).loc[df.index].sort_index(axis=1, level=(0, 1))
inetna.datasets.tsdataset.py
- Correct it in a way that puts "target" before other columns, still sorted in alphabetical order.
Test cases
- Fix doctest of
TSDataset.to_dataset()
. - Make sure current tests pass.
- Add tests on order of columns for both modified methods to
etna.tests.test_datasets.test_dataset.py
:
-
test_to_dataset_correct_column_order
forTSDataset.to_dataset()
-
test_init_with_exog_correct_column_order
forTSDataset.__init__()
withdf_exog != None
Additional context
See issue#873 for similar issue for TSDataset.to_flatten()