darts icon indicating copy to clipboard operation
darts copied to clipboard

feat: Added frequency aware one-hot and relative cyclic encoding.

Open konsram opened this issue 4 months ago • 3 comments

Checklist before merging this PR:

  • [x] Mentioned all issues that this PR fixes or addresses.
  • [x] Summarized the updates of this PR under Summary.
  • [ ] Added an entry under Unreleased in the Changelog.

Fixes #2842 .

Summary

Changes in this PR give users more options for encoding datetime attributes. This includes:

  1. Fixing the inconsistend behavior for cyclic encodings. Previously, day was encoded relative to the number of days in a month, while other attributes with a variable maximum (dayofyear, day_of_year, week, weekofyear, week_of_year) were encoded relative to the maximum on the specified time index.
  2. Adding frequency aware one-hot-encodings for datetime attributes. Previously, one-hot-encodings always considered all possible values of an attribute (e.g. 60 values for minute). Useres are now given the option to use a frequency aware one-hot-encoding. The frequency aware option considers the start of the time index and the frequency of the index to determine possible values (e.g. (0, 15, 30, 45) when start is noramlized for an hour and frequency is 15min). This reduces the number of covariates, which may be critical for models who can't handle high dimensional feature spaces.
  3. Adding a OneHotTemporalEncoder class, which uses the functionality from (2) and integrates into SequentialEncoder. This requires changing the attributes available to SingleEncoder (encoders must be aware of the frequency and start time of the data).
  4. Extending the CyclicTemporalEncoder to reflect the changes in (1).

Other Information

Draft Progress

Change Implementation Tests Documentation
Inconsistent Cyclic Encoding :heavy_check_mark: :x: :x:
Frequency Aware One-Hot-Encoding :heavy_check_mark: :x: :x:
OneHotTemporalEncoder :x: :x: :x:
Changes to CyclicTemporalEncoder :x: :x: :x:

konsram avatar Sep 07 '25 15:09 konsram

@dennisbader What do you think about the new options for encodings mentioned in (1) and (2)? The frequency awareness for one-hot encodings does not cover all possible frequencies, but I tried to address common scenarios, and users can always use the frequency-unaware version if needed.

Do you think changing the information provided to SingleEncoders is a viable approach for (3) and (4)?

Regarding my comment in #2842: Currently I am using the raw values, what do you think about this?

konsram avatar Sep 07 '25 15:09 konsram

Codecov Report

:x: Patch coverage is 86.04651% with 12 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 95.12%. Comparing base (8821f51) to head (924448c). :warning: Report is 27 commits behind head on master.

Files with missing lines Patch % Lines
darts/utils/timeseries_generation.py 86.04% 12 Missing :warning:
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2893      +/-   ##
==========================================
- Coverage   95.27%   95.12%   -0.15%     
==========================================
  Files         146      146              
  Lines       15588    15640      +52     
==========================================
+ Hits        14851    14878      +27     
- Misses        737      762      +25     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar Sep 12 '25 14:09 codecov[bot]

@dennisbader, what do you think about the proposed changes, especially making the encoders aware of a time series' frequency and start? Do you see any drawbacks to this approach?

konsram avatar Oct 02 '25 09:10 konsram