gpt-2
gpt-2 copied to clipboard
Intermediate Layer Output
Similar to the issue I posted here: https://github.com/openai/gpt-2/issues/148 -- Is it possible to use the intermediate layer outputs and generate text ignoring the layers on top?Basically, I want to check quality of generations as we keep on adding more layers. What modifications in the src/sample.py script would I have to make for the same? Thanks.
Hmm, I haven't tried something like this, so I don't know for sure. That said, it may be as simple as editing n_layer
in the model hparams.json
, to a smaller number. Then only the first n_layer
layers will be loaded from the checkpoint. You will have to retrain, obviously, since the intermediate layers wouldn't have been optimized for producing text, and would probably produce gibberish by default.
Hmm, I haven't tried something like this, so I don't know for sure. That said, it may be as simple as editing
n_layer
in the modelhparams.json
, to a smaller number. Then only the firstn_layer
layers will be loaded from the checkpoint. You will have to retrain, obviously, since the intermediate layers wouldn't have been optimized for producing text, and would probably produce gibberish by default.
You're right. I did try and it indeed does produce gibberish for the first 8 or so layers. My main motive behind this was to capture the various characteristics of text that are learned layer-by-layer, and how text quality improves (or characteristics change) as we move upwards the transformer layers. Something like, say, the syntactic properties are captured within the lower layers, then some other characteristics upwards etc. Retraining would just essentially mean training with lower no. of transformers, so I guess that's not ideal for this. Keeping this open for now, any more ideas on this would be great!
Hi Bakszero, I think it's better if you add an auxiliary output layer(linear projection+softmax) for each decoder. In this way, the learning ability of each layer can be more comparable.
@weiguowilliam I think that's a good idea indeed, thanks!