gpt-2 icon indicating copy to clipboard operation
gpt-2 copied to clipboard

Intermediate Layer Output

Open bakszero opened this issue 5 years ago • 4 comments

Similar to the issue I posted here: https://github.com/openai/gpt-2/issues/148 -- Is it possible to use the intermediate layer outputs and generate text ignoring the layers on top?Basically, I want to check quality of generations as we keep on adding more layers. What modifications in the src/sample.py script would I have to make for the same? Thanks.

bakszero avatar Jun 19 '19 11:06 bakszero

Hmm, I haven't tried something like this, so I don't know for sure. That said, it may be as simple as editing n_layer in the model hparams.json, to a smaller number. Then only the first n_layer layers will be loaded from the checkpoint. You will have to retrain, obviously, since the intermediate layers wouldn't have been optimized for producing text, and would probably produce gibberish by default.

nshepperd avatar Jun 21 '19 18:06 nshepperd

Hmm, I haven't tried something like this, so I don't know for sure. That said, it may be as simple as editing n_layer in the model hparams.json, to a smaller number. Then only the first n_layer layers will be loaded from the checkpoint. You will have to retrain, obviously, since the intermediate layers wouldn't have been optimized for producing text, and would probably produce gibberish by default.

You're right. I did try and it indeed does produce gibberish for the first 8 or so layers. My main motive behind this was to capture the various characteristics of text that are learned layer-by-layer, and how text quality improves (or characteristics change) as we move upwards the transformer layers. Something like, say, the syntactic properties are captured within the lower layers, then some other characteristics upwards etc. Retraining would just essentially mean training with lower no. of transformers, so I guess that's not ideal for this. Keeping this open for now, any more ideas on this would be great!

bakszero avatar Jun 23 '19 04:06 bakszero

Hi Bakszero, I think it's better if you add an auxiliary output layer(linear projection+softmax) for each decoder. In this way, the learning ability of each layer can be more comparable.

weiguowilliam avatar Sep 21 '19 21:09 weiguowilliam

@weiguowilliam I think that's a good idea indeed, thanks!

bakszero avatar Sep 22 '19 05:09 bakszero