Federico Andres Lois
Federico Andres Lois
Yes. I will let @ayende to try this one out on the Kobo dataset before changing it...
Now, it is ready to be merged. @arekpalinski
From my experience in reinforcement learning, I have come to the realization that negative signals, especially of the global type, are specially tricky to get right. The reason for this...
Thats correct it is similar in nature though the solution does not solve the problem. If you dont post, you dont care about the deboosting, bad actors exist (a fact...
It may be a good stop-gap method, but this does not solve the problem. On this comment I layout how you would attack this time-based decay. Just unblock and reblock...
IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters https://arxiv.org/abs/1903.12141
This is the configuration I am using for Mixer MLP ``` "activation": "mish", "architecture": "mixer_mlp", "depth": 12, "expansion_factor": 2, "expansion_factor_token": 0.5, "feature_dropout": 0.2, "latent_dim": 4096, "normalization": "none", "position_encoding": "none", ```...
Much faster but still taking 114 seconds per iteration. Same GPU model but slightly bigger model (300M parameter) in this case as this is the GPU that just finished an...
Let me know when you want me to test something.