Greg DeVos comments

Results 26 comments of


                                            Greg DeVos

Three new alternatives to layer norm

@hrzn I asked lucidrain about it and here is his [response](https://github.com/lucidrains/x-transformers/issues/101). Let me know what you think and how we want to add it

Three new alternatives to layer norm

I removed ScaleNorm as suggested by lucidrains

Three new alternatives to layer norm

@hrzn The three variants are now `LayerNorm`, `LayerNormNoBias` and `RMSNorm`. I will post comparison on the sunspot dataset shortly.

Three new alternatives to layer norm

I definitely went down a rabbit hole trying to make these graphs. Here are 9 examples using relatively small models (~32k params). ![Backtest_batch_size=128,norm_type=LayerNorm](https://user-images.githubusercontent.com/15316026/186451438-7b0e4540-7291-4974-9ef9-5094cec25c04.png) ![Backtest_batch_size=128,norm_type=LayerNormNoBias](https://user-images.githubusercontent.com/15316026/186451879-70bb062e-d713-43e4-bf27-6342e784f766.png) ![Backtest_batch_size=128,norm_type=RMSNorm](https://user-images.githubusercontent.com/15316026/186451896-98d4361f-f7cc-4650-b14d-d971104b7114.png) ![Backtest_batch_size=256,norm_type=LayerNorm](https://user-images.githubusercontent.com/15316026/186451925-ff107266-cc52-4ed1-aeb1-af59eee97144.png) ![Backtest_batch_size=256,norm_type=LayerNormNoBias](https://user-images.githubusercontent.com/15316026/186451947-a05e5ac9-9e66-4895-93d0-d0cdd89c89b4.png) ![Backtest_batch_size=256,norm_type=RMSNorm](https://user-images.githubusercontent.com/15316026/186452037-d4e2ed9e-53ce-4c0a-bb1e-1284c54f4e2b.png) ![Backtest_batch_size=512,norm_type=LayerNorm](https://user-images.githubusercontent.com/15316026/186452160-0b9daee3-cab2-494b-a48d-bc1d92a35c01.png) ![Backtest_batch_size=512,norm_type=LayerNormNoBias](https://user-images.githubusercontent.com/15316026/186452190-5890d342-16c0-4899-b29f-a1d0940dc2db.png)...

Three new alternatives to layer norm

@hrzn I added it to the `TransformerModel`. The implementation is a bit clucky but to support both GLU variants and layer norm individually while maintaining the default behavior was a...

Three new alternatives to layer norm

@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :(

NBEATS Optimal HyperParameters for large datasets

1. Adding an early stop can help cut down the training time. Is there a validation set you can use?

NBEATS Optimal HyperParameters for large datasets

You use the PyTorch lightning early stop callback ```python my_stopper = EarlyStopping( monitor="val_loss", patience=5, min_delta=0.05, mode='min', ) pl_trainer_kwargs={"callbacks": [my_stopper]} model = NBEATSModel(..., pl_trainer_kwargs=pl_trainer_kwargs) model.fit( series=train, val_series=val, past_covariates=train_covariates val_past_covariates=val_covariates ) ```

ValueError: The time series array must not be empty.

Could you post the code used to create series1 and series2?

ValueError: The time series array must not be empty.

I would try breaking each step into seperate lines. One of those functions calls is returning None.