Step 4: Evaluating Models with knn:Incorrect perplexity (ppl)
Why is it that when I reproduce Step 4: Evaluating Models, the perplexity (ppl) I get from running knn-LM is around 17? Could you please explain why this is the case? I would greatly appreciate it if you could provide a response.
Hi Rubin, Thank you for your interest in our work.
Does it still happen when you use our datastore and our index?
Best, Uri
On Wed, Dec 4, 2024 at 2:41 PM Rubin @.***> wrote:
Why is it that when I reproduce Step 4: Evaluating Models, the perplexity (ppl) I get from running knn-LM is around 17? Could you please explain why this is the case? I would greatly appreciate it if you could provide a response.
— Reply to this email directly, view it on GitHub https://github.com/neulab/knn-transformers/issues/18, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADSOXMBTE24EH4CJMZOL3GT2D5LGHAVCNFSM6AAAAABTA5C5N6VHI2DSMVQWIX3LMV43ASLTON2WKOZSG4YTQNRUGEYTGMQ . You are receiving this because you are subscribed to this thread.Message ID: @.***>
Dear author,
Thank you very much for your response! I am using the neulab/gpt2-finetuned-wikitext103 model, and the dataset is Wikitext-103. The index and vals files I am using are gpt2/index_gpt2_116988150_768.indexed and gpt2/dstore_gpt2_116988150_768_vals.npy, respectively, from the link https://knn-transformers.s3.amazonaws.com/index.html.
However, when using the --knn option, the perplexity (PPL) of GPT2 is 17.34, which is significantly higher than the 12.57 you provided. I was wondering if you know what might be causing this discrepancy?
Another question is that in your article, the method RetoMaton is compared using Foss, and according to your image, a smaller Foss value indicates a lower PPL and better performance. However, for the knn-LM, it seems there is no hyperparameter related to Foss in the code.
If I could receive your reply, it would be greatly appreciated.
I set the parameter knn_gpu to False with a perplexity value of 12.5734, and when I set knn_gpu to True, the perplexity value becomes 17.3421.
Hi, I am also having the same issue - I get a perplexity score of 17 with knn-lm on the wikipedia dataset.
Retomaton is within the ballpark of 13 and 12, improving the finetuned baseline, so this is in agreement when I change the min_knns parameter. Can I please clarify this too? I am using knn-gpu = True and I built the wiki datastore using the code in the repository.
I suspect that the loss calculation is different between knnlm and retomaton and this affects perplexity. When I compare generated outputs of knnlm and retomaton, I get almost identical ROUGE scores, but the perplexity of the knn-LM model is still significantly higher.
Kindly seeking help and I hope this question is reasonable. I am not a domain expert so if I'm doing something wrong please let me know.
I set the parameter knn_gpu to False with a perplexity value of 12.5734, and when I set knn_gpu to True, the perplexity value becomes 17.3421.
Is my issue a GPU vs CPU setup consideration then?
Hi folks, Thank you for your interest in our work.
Unfortunately, this codebase is 4 years old. I don't have the capacity to investigate why KNN-GPU gives different results than KNN-CPU, and I don't have access to the same servers. Many things have probably changed in Faiss library, which was unstable to begin with.
If KNN-CPU works, I recommend checking the Faiss documentation if there is anything that can make the GPU version equivalent.
Best, Uri
Thanks for your help and prompt response. Noted on this.
Hi, I was able to reproduce your perplexity score of 12.57 for the GPT-2 finetuned model from the HF page on CPU settings for knnlm (with default wrapper parameters).