gutenberg icon indicating copy to clipboard operation
gutenberg copied to clipboard

What should we do for Mayan and Indigenous_languages_of_the_Americas languages

Open benoit74 opened this issue 2 months ago • 5 comments

Some books are in following languages at Gutenberg level:

  • myn => https://en.wikipedia.org/wiki/Mayan_languages
  • nai => https://en.wikipedia.org/wiki/Indigenous_languages_of_the_Americas

These are valid ISO639-2 codes, but they have been "split" in many ISO639-3 codes.

Do we really want to add all of these codes as suggested in https://github.com/openzim/overview/issues/51 ?

benoit74 avatar Oct 13 '25 14:10 benoit74

It is a good way to see what happens if we do so!

kelson42 avatar Oct 14 '25 02:10 kelson42

My question is not a technical one but more from a "veracity" standpoint.

I'm really not sure which ISO693-3 codes should be included (what is myn splitted into 639-3 ? Having a look at https://glottolog.org/resource/languoid/id/maya1287 seems to indicate tens of ISO639-3 subdivisions) and no clue which of these languages are really used in the books.

It is not like a language which has been split in two and you have significant chances both are present in the ZIM in the end. Here we speak about apparently tens of codes, and probably only a handful are really present.

I feel like it would be very deceptive to advertise languages which do not really exist in the ZIM.

benoit74 avatar Oct 14 '25 09:10 benoit74

  • Take contact with Eric to maybe fix the problem upstream
  • Deactivate the recipes for both languages
  • Ignore these books for the mul
  • Postpone the issue

kelson42 avatar Oct 17 '25 12:10 kelson42

  • Recipes have been disabled with a mention about this issue
  • Corresponding books are already ignored in the mul
  • Postponed to later

@eshellman do you know if it would be possible to assign more specific language to these books than the very generic myn and nai collections? At Kiwix we've chosen to use ISO639-3, and while these myn and nai are valid from RFC4646 / IANA Language subtags registry perspectives, they do not really help use to properly classify these books / ZIMs.

benoit74 avatar Oct 17 '25 14:10 benoit74

Our cataloger follows what Library of Congress says.

I'm guessig that if you look in the 3 books of each myn and nai, you might find that they arent a specific myn or nai language.

Eric

On Oct 17, 2025, at 10:38 AM, benoit74 @.***> wrote:

benoit74 left a comment (openzim/gutenberg#322) https://github.com/openzim/gutenberg/issues/322#issuecomment-3415870076 Recipes have been disabled with a mention about this issue Corresponding books are already ignored in the mul Postponed to later @eshellman https://github.com/eshellman do you know if it would be possible to assign more specific language to these books than the very generic myn and nai collections? At Kiwix we've chosen to use ISO639-3, and while these myn and nai are valid from RFC4646 / IANA Language subtags registry perspectives, they do not really help use to properly classify these books / ZIMs.

— Reply to this email directly, view it on GitHub https://github.com/openzim/gutenberg/issues/322#issuecomment-3415870076, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHCGMMBO4CCTXPVITUDPYT3YD5MRAVCNFSM6AAAAACJBQKBTKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTIMJVHA3TAMBXGY. You are receiving this because you were mentioned.

eshellman avatar Oct 17 '25 20:10 eshellman