equilid icon indicating copy to clipboard operation
equilid copied to clipboard

outputs can contain values more than the max size of rev_lang_vocab

Open ganeshjawahar opened this issue 3 years ago • 2 comments

When the output emits ids which are larger than the max size of the rev_lang_vocab, it throws an index error in this line.

https://github.com/davidjurgens/equilid/blob/master/equilid/equilid.py#L667

As a result, the predictions list is empty which leads to erroneous results.

ganeshjawahar avatar Oct 09 '21 06:10 ganeshjawahar

On inspecting deeper, I find the second axis of output logits to be of size 40k. This means, the output can contain indices between 0 -39999 which will be a challenge when trying to map to the labels

ganeshjawahar avatar Oct 09 '21 07:10 ganeshjawahar

Further inspection shows the default values of char vocab size and lang vocab size is 40k. Is that expected?

https://github.com/davidjurgens/equilid/blob/master/equilid/equilid.py#L81

ganeshjawahar avatar Oct 09 '21 07:10 ganeshjawahar