lang2vec
lang2vec copied to clipboard
`available_uriel_languages` is not correct
I've noticed that available_uriel_languages
does not work properly. I am not sure why it is filtered according to fam
features, but the filtering itself seems buggy. The resulting mask has only ~3.5k elements, even though we have 7k languages with URIEL features. All of elements in the mask are True
. Additionally, the number 3.5k is exactly equal to the number of language families. I suspect that the reduction in np.all
might being done along wrong axis.
I have sidestepped the issue for now by using data directly from the feature_predictions.npz
file.
https://github.com/antonisa/lang2vec/blob/82ab4457ae3a45f552b8d70310ac2a259b44c62a/lang2vec/lang2vec.py#L57-L69
Huh, thanks! I'll look into incorporating this.