etna icon indicating copy to clipboard operation
etna copied to clipboard

Replace `MeanSegmentEncoderTransform` with `MeanEncoderTransform`

Open Mr-Geekman opened this issue 2 years ago • 0 comments

🚀 Feature Request

We can make class MeanEncoderTransform that will work with any feature. In that case MeanSegmentEncoderTransform can be replace with SegmentEncoderTransform + MeanEncoderTransform.

Motivation

It will make API more flexible and add new functionality to encoders.

Proposal

  1. Implement MeanEncoderTransform
  2. Add deprecation warning on MeanSegmentEncoderTransform until etna-3.0.
  3. Remove MeanSegmentEncoderTransform from examples.

This transform should

  1. Order table by (timestamp, segment), we don't want a target leakage.
  2. Make mean encoding based on a given feature after ordering.
  3. Work correctly with NaNs in feature (global running mean).
  4. Work correctly with new values of feature (global running mean).
  5. Add additive smoothing like in CatBoostEncoder.
  6. On prediction phase we should use mean calculated on the whole history.

Transform should support two modes (like SklearnTransform):

  • macro: calculate running mean across all segments
  • micro (per-segment): calculate running mean across each segment individually

Notes:

  1. There are few implementation possibilities to avoid leak:
  • for encoding of current row use all rows with timestamp < that timestamp of the current row -- this option seems more difficult to implement in vectorized fashion
  • for encoding of current row use all the rows before including current but after all the encoding make lag-1 in each segment to get rid of leakage -- this option seems easier to implement

Test cases

  1. Test that it works fine if there are no NaNs and unknown values on train
  2. Test that it works fine if there are no NaNs and unknown values on test
  3. Test that it works fine if there are NaNs on train/test
  4. Test that if works fine if there are unknown values on test
  5. Make sure that combination of SegmentEncoderTransform + MeanEncoderTransform works fine.

Alternatives

No response

Additional context

No response

Checklist

  • [X] I discussed this issue with ETNA Team

Mr-Geekman avatar Jun 14 '22 14:06 Mr-Geekman