hacker-news-gpt-2 icon indicating copy to clipboard operation
hacker-news-gpt-2 copied to clipboard

What was the latest average loss?

Open iedmrc opened this issue 5 years ago • 6 comments

Hi, In the readme file, it says:

Dump of generated texts from gpt-2-simple trained on Hacker News titles until April 25th, 2019 (about 603k titles, 30MB of text) for 36,813 steps (12 hours w/ a P100 GPU, costing ~$6). The output is definitely not similar to that of Markov chains.

My questions are:

  1. What was the latest avg_loss you've reached?
  2. Which model (117M or 345M) did you train it with?
  3. Which parameters (especially talking about learning rate) did you use?

Answers to these questions can give us more intuition while training the gpt-2-simple on our datasets.

Thanks for your answer!

iedmrc avatar Jul 26 '19 22:07 iedmrc

FWIW, I've replicated this for another project with 100k steps on a P100. The final loss was somewhere between 0.85 and 0.98 IIRC. I trained on the 345M. The params were the same as in @minimaxir GPT2-simple Colab notebook.

turbo avatar Jul 28 '19 12:07 turbo

Thanks for the answer! Was the result (samples it generated) satisfactory for you? How much did it take to train 100k steps on P100 in your case?

iedmrc avatar Jul 28 '19 14:07 iedmrc

satisfactory

That term is pretty ambiguous. I certainly saw no significant improvements after about 50k steps. The coherence/funniness/uniqueness matched the sample batches in this repo more or less at that point. I saw slightly more interesting results when I mixed in titles from my medium125k dataset.

How much did it take to train 100k steps on P100 in your case?

I'm using a Scaleway P100 instance (1 EUR/h). It took me two days, though not with continuous learning. It did take about 12 to 14 hours for each 50k segments IIRC.

The T4 Colab env should suffice to train at a reasonable speed, though I have not yet found a way to get a T4 env reliably. I get K80s about half of the time creating a notebook. TPUs might be worth exploring.

turbo avatar Jul 28 '19 14:07 turbo

Thanks for sharing your experiences!

iedmrc avatar Jul 28 '19 14:07 iedmrc

Yeah, final loss slightly below 1.0 sounds around right.

FWIW in my work I don't really pay attention to the absolute value of loss; just whether if it's going down.

minimaxir avatar Jul 28 '19 14:07 minimaxir

@minimaxir What about calculating validation loss? As I see, gpt-2-simple calculates loss but not validation_loss. What would you suggest to evaluate the trained model?

iedmrc avatar Jul 29 '19 20:07 iedmrc