DNABERT Long training time and overfitting when reproducing your example

Long training time and overfitting when reproducing your example

Open warm-ice0x00 opened this issue 2 years ago • 1 comments

Hello, I am trying to reproduce your results. I configured the environment as instructed in the README and ran your pre-training program with the pre-training dataset you provided with 3000 pieces of data. However, even if I reduced the number of training steps to one hundredth of your example, the estimated total training time is over 50 hours on 8 NVIDIA A100s. At the same time, because the number of training steps is reduced, although the training loss curve is in a downward trend, the jitter is serious, and the validation loss is much larger than the training loss, which seems to be overfitting. I am wondering how should I solve these problems to get the expected results? Thanks.

Apr 02 '22 09:04 warm-ice0x00

I meet the same problem. Can we discuss more about this paper?

Feb 22 '23 11:02 DwanZhang-AI

DNABERT DNABERT copied to clipboard

Long training time and overfitting when reproducing your example

DNABERT
DNABERT copied to clipboard