Federico Andres Lois

Results 45 comments of Federico Andres Lois

Yes. I will let @ayende to try this one out on the Kobo dataset before changing it...

From my experience in reinforcement learning, I have come to the realization that negative signals, especially of the global type, are specially tricky to get right. The reason for this...

Thats correct it is similar in nature though the solution does not solve the problem. If you dont post, you dont care about the deboosting, bad actors exist (a fact...

It may be a good stop-gap method, but this does not solve the problem. On this comment I layout how you would attack this time-based decay. Just unblock and reblock...

IMAE for Noise-Robust Learning: Mean Absolute Error Does Not Treat Examples Equally and Gradient Magnitude's Variance Matters https://arxiv.org/abs/1903.12141

This is the configuration I am using for Mixer MLP ``` "activation": "mish", "architecture": "mixer_mlp", "depth": 12, "expansion_factor": 2, "expansion_factor_token": 0.5, "feature_dropout": 0.2, "latent_dim": 4096, "normalization": "none", "position_encoding": "none", ```...

Much faster but still taking 114 seconds per iteration. Same GPU model but slightly bigger model (300M parameter) in this case as this is the GPU that just finished an...

Let me know when you want me to test something.