german-gpt2
german-gpt2 copied to clipboard
Low log-probabilities for a grammatical German sentence
Hi,
How are there any evaluation scores (a such as perplexity) of the model available? I've come to some unexpected results, which make me wonder about the model's general performance and also its training procedure. Was the model trained from scratch on German-only data or from a checkpoint of an English or other model?
Here are my unexpected results:
-
I used the minicons library to obtain log-likelihood scores for each token in an example German sentence. The log-probabilities were quite low given that the sentence in German and grammatical. The probabilities for English were actually higher. See this issue showing detailed results.
-
I also wanted to use the evaluation library to get perplexity of a couple of sentences.The model could not be loaded for some reason. I had no issue with using the English GPT-2 with the same set of sentences. This issue shows the error in detail.
I'd be happy to hear your thoughts on this!
See my comment here: https://github.com/huggingface/evaluate/issues/313#issuecomment-1273748511 The tokenizer wasn't created properly, the tokenizer should use ids between 0 and 50264 (vocab_size=50265).