Greg DeVos
Greg DeVos
@hrzn I asked lucidrain about it and here is his [response](https://github.com/lucidrains/x-transformers/issues/101). Let me know what you think and how we want to add it
I removed ScaleNorm as suggested by lucidrains
@hrzn The three variants are now `LayerNorm`, `LayerNormNoBias` and `RMSNorm`. I will post comparison on the sunspot dataset shortly.
I definitely went down a rabbit hole trying to make these graphs. Here are 9 examples using relatively small models (~32k params).        ...
@hrzn I added it to the `TransformerModel`. The implementation is a bit clucky but to support both GLU variants and layer norm individually while maintaining the default behavior was a...
@hrzn We are good to merge! Sorry I took so long adding the tests. We are still trying to move :(
1. Adding an early stop can help cut down the training time. Is there a validation set you can use?
You use the PyTorch lightning early stop callback ```python my_stopper = EarlyStopping( monitor="val_loss", patience=5, min_delta=0.05, mode='min', ) pl_trainer_kwargs={"callbacks": [my_stopper]} model = NBEATSModel(..., pl_trainer_kwargs=pl_trainer_kwargs) model.fit( series=train, val_series=val, past_covariates=train_covariates val_past_covariates=val_covariates ) ```
Could you post the code used to create series1 and series2?
I would try breaking each step into seperate lines. One of those functions calls is returning None.