mamba
mamba copied to clipboard
Small datasets
I'm curious; I've trained a 20M MAMBA model for molecular generation, and it seems to fair quite badly when trained on small datasets. I added a dropout layer since it seems to overfit otherwise, but would Mamba perhaps need a lot of intricate optimisation and regularisation to work well with smaller datasets?
I know previous LSTM and RNN models needed this (https://arxiv.org/pdf/1708.02182v1) and curious about your intuition.