DNABERT
DNABERT copied to clipboard
Long training time and overfitting when reproducing your example
Hello, I am trying to reproduce your results. I configured the environment as instructed in the README and ran your pre-training program with the pre-training dataset you provided with 3000 pieces of data. However, even if I reduced the number of training steps to one hundredth of your example, the estimated total training time is over 50 hours on 8 NVIDIA A100s. At the same time, because the number of training steps is reduced, although the training loss curve is in a downward trend, the jitter is serious, and the validation loss is much larger than the training loss, which seems to be overfitting. I am wondering how should I solve these problems to get the expected results? Thanks.
I meet the same problem. Can we discuss more about this paper?