lang2vec icon indicating copy to clipboard operation
lang2vec copied to clipboard

`available_uriel_languages` is not correct

Open matus-pikuliak opened this issue 1 year ago • 1 comments

I've noticed that available_uriel_languages does not work properly. I am not sure why it is filtered according to fam features, but the filtering itself seems buggy. The resulting mask has only ~3.5k elements, even though we have 7k languages with URIEL features. All of elements in the mask are True. Additionally, the number 3.5k is exactly equal to the number of language families. I suspect that the reduction in np.all might being done along wrong axis.

I have sidestepped the issue for now by using data directly from the feature_predictions.npz file.

https://github.com/antonisa/lang2vec/blob/82ab4457ae3a45f552b8d70310ac2a259b44c62a/lang2vec/lang2vec.py#L57-L69

matus-pikuliak avatar Jan 01 '23 18:01 matus-pikuliak