ludwig
ludwig copied to clipboard
Initial time series implementation
This PR contains the following (so far):
- Enable loading of time series data from column-major as well as row-major sources (specified by a config option). If column-major, all input data must currently be time series (and this is checked), while the row-major case proceeds normally.
- Enable undersampling time series signals, and doing direct k-step ahead prediction (rather than just predicting the next step). Currently, undersampling simply samples every n datapoints.
- Refactor numeric feature transformations into
ludwig/features/feature_transform_utils.py, for ease of use across classes (for now, numerical and time series features). - Add RMSE metric.
Example test case instructions:
- Download and unpack hourly weather data from https://www.kaggle.com/selfishgene/historical-hourly-weather-data
- Run
ludwig experiment --dataset temperature.csv --config_file config.yamlwith config.yaml below
config.yaml:
preprocessing:
column_major: True
splits_in_order: True
input_features:
-
name: Los Angeles in
column: Los Angeles
type: timeseries
encoder: rnn
state_size: 32
preprocessing:
normalization: zscore
timeseries_length_limit: 20
padding_value_strategy: fill_with_mean
missing_value_strategy: fill_with_mean
padding: left
-
name: Seattle
type: timeseries
encoder: rnn
state_size: 32
preprocessing:
normalization: zscore
timeseries_length_limit: 20
padding_value_strategy: fill_with_mean
missing_value_strategy: fill_with_mean
padding: left
output_features:
-
name: Los Angeles out
column: Los Angeles
type: numerical
preprocessing:
normalization: zscore
missing_value_strategy: fill_with_mean
@w4nderlust @tgaddair