neuralforecast icon indicating copy to clipboard operation
neuralforecast copied to clipboard

[FIX] Unify API

Open elephaint opened this issue 1 year ago • 6 comments

This is a large refactoring PR and open for discussion. The main goal of the PR is to unify API across different model types, and unify loss functions across different loss types.

Refactoring:

  • Fuses BaseWindows, BaseMultivariate and BaseRecurrent into BaseModel, removing the need for separate classes and unifying model API across different model types. Instead, this PR introduces two model attributes, yielding four possible model options: RECURRENT (True/False) and MULTIVARIATE (True/False). We currently have a model for every combination except a recurrent multivariate model (e.g. a multivariate LSTM), however this is now relatively simple to add. In addition, this change allows to have models that can be recurrent or not, or multivariate or not on-the-fly, based on users' input. This also allows for easier modelling going forward.
  • Unifies model API across all models, adding missing input variables to all model types.
  • Refactors losses, a.o. removing unnecessary domain_map functions.
  • Moves loss.domain_map outside of models to BaseModel
  • Moves RevINMultivariate used by TSMixer, TSMixerx and RMoK to common.modules

Features:

  • All losses compatible with all types of models (e.g. univariate/multivariate, direct/recurrent) OR appropriate protection added.
  • DistributionLoss now supports the use of quantiles in predict, allowing for easy quantile retrieval for all DistributionLosses.
  • Mixture losses (GMM, PMM and NBMM) now support learned weights for weighted mixture distribution outputs.
  • Mixture losses now support the use of quantiles in predict, allowing for easy quantile retrieval.
  • Improved stability of ISQF by adding softplus protection around some parameters instead of using .abs
  • Unified API for any quantile or any confidence level during predict for both point- and distribution losses.

Bug fixes:

  • MASE loss now works.
  • Added various protections around parameter combinations that are invalid (e.g. regarding losses).
  • StudentT increase default DoF to 3 to reduce unbound variance issues.
  • All models are now tested using a test function on the AirPassengers dataset; in most models we included eval: false on the examples whilst not having any other tests, causing most models to effectively not being tested at all.
  • IQLoss doesn't give monotonic quantiles, now it does (by quantiling the quantiles)
  • When training with both a conformal method and non-conformal method, the latter is also cross-validated to compute conformity scores. This is a redundant training step, and removed in this PR.

Breaking changes:

  • Rewrite of all recurrent models to get rid of the quadratic (in the sequence dimension) space complexity. As a result, it is impossible to load a recurrent model from a previous version into this version.
  • Recurrent models now require an input_size to be given.
  • TCN and DRNN are now windows models, not recurrent models.

Tests:

  • Added common._model_checks.py that includes a model testing function. This function runs on every separate model, ensuring that every model is tested on push.

Todo:

  • [x] Test models on speed/scaling as compared to current implementation across a set of datasets.
  • [x] Make sure docstring of all multivariate models is updated to reflect the additional inputs

elephaint avatar May 31 '24 17:05 elephaint

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

this is a very cool effort @elephaint. the new features look exciting (eg losses compatibility, model unification, ...) and there are a lot of bug fixes, too!

AzulGarza avatar Oct 04 '24 01:10 AzulGarza

this is a very cool effort @elephaint. the new features look exciting (eg losses compatibility, model unification, ...) and there are a lot of bug fixes, too!

Thanks! At this moment the issue is mostly that there is some performance regression; little bit of work to do there....

elephaint avatar Oct 08 '24 11:10 elephaint

Performance tests - first picture is baseline (existing main repo), second picture is this PR.

Conclusion

  • There is some performance regression, on some recurrent models. This is due to the different implementation, where recurrent models are now flexibly implemented as either direct or recurrent models (they have a flag for setting this - by default they are implemented as direct). I'm ok with this performance regression, as it offers massive gains in space complexity - which is the key benefit / reason to use a recurrent model.
  • Larger models (multivariate, Transformers) seem to be faster with the current implementation.

Model performance

Screenshot 2024-10-08 174117

Screenshot 2024-10-11 144130

Model performance 2

Screenshot 2024-10-08 174125

Screenshot 2024-10-11 144138

Multivariate model performance

Screenshot 2024-10-08 174132

Screenshot 2024-10-11 144148

elephaint avatar Oct 08 '24 15:10 elephaint

I ran my own small experiment, and models are slightly faster, with no degradation in performance. The refactoring looks good to me, but a second opinion would be great on this!

marcopeix avatar Oct 15 '24 14:10 marcopeix

I ran my own small experiment, and models are slightly faster, with no degradation in performance. The refactoring looks good to me, but a second opinion would be great on this!

Thanks!

elephaint avatar Oct 15 '24 15:10 elephaint

Todo:

  • [x] Check correct functioning of #1233 and add a test for that functionality in the code
  • [x] ~~Add #1233 to mixture losses~~ Not in this PR, keep it in backlog
  • [x] Check correct functioning of horizon_weight in other losses

elephaint avatar Jan 31 '25 15:01 elephaint