ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Initial time series implementation

Open nimz opened this issue 4 years ago • 0 comments

This PR contains the following (so far):

  1. Enable loading of time series data from column-major as well as row-major sources (specified by a config option). If column-major, all input data must currently be time series (and this is checked), while the row-major case proceeds normally.
  2. Enable undersampling time series signals, and doing direct k-step ahead prediction (rather than just predicting the next step). Currently, undersampling simply samples every n datapoints.
  3. Refactor numeric feature transformations into ludwig/features/feature_transform_utils.py, for ease of use across classes (for now, numerical and time series features).
  4. Add RMSE metric.

Example test case instructions:

  • Download and unpack hourly weather data from https://www.kaggle.com/selfishgene/historical-hourly-weather-data
  • Run ludwig experiment --dataset temperature.csv --config_file config.yaml with config.yaml below

config.yaml:

preprocessing:
        column_major: True
        splits_in_order: True

input_features:
    -
        name: Los Angeles in
        column: Los Angeles
        type: timeseries
        encoder: rnn
        state_size: 32
        preprocessing:
                normalization: zscore
                timeseries_length_limit: 20
                padding_value_strategy: fill_with_mean
                missing_value_strategy: fill_with_mean
                padding: left
    -
        name: Seattle
        type: timeseries
        encoder: rnn
        state_size: 32
        preprocessing:
                normalization: zscore
                timeseries_length_limit: 20
                padding_value_strategy: fill_with_mean
                missing_value_strategy: fill_with_mean
                padding: left

output_features:
    -
        name: Los Angeles out
        column: Los Angeles
        type: numerical
        preprocessing:
                normalization: zscore
                missing_value_strategy: fill_with_mean

@w4nderlust @tgaddair

nimz avatar Mar 27 '21 14:03 nimz