wikipron icon indicating copy to clipboard operation
wikipron copied to clipboard

Massively multilingual pronunciation mining

Results 30 wikipron issues
Sort by recently updated
recently updated
newest added

I noticed this problem for [Armenian](https://en.wiktionary.org/wiki/%D5%A5%D6%80%D5%AF%D6%80%D5%B8%D6%80%D5%A4) and a colleague told me it's also found in [Portuguese](https://en.wiktionary.org/wiki/afetar). For some languages, the pronunciation entry can use a nested list, such that *...

bug

In Wiktionary, German includes Swiss German and Germany German as its dialects. These "dialects" are each labeled with `(Standard German of Germany)` and `(Standard German of Switzerland)` or `Swiss German`....

bug

- [x] Updated `Unreleased` in `CHANGELOG.md` to reflect the changes in code or data. `ger` is split into Swiss German and Germany German. `Nep` duplicate is removed from `languages.json`

In the Slovenian data, some of the vowels with tone marking (e.g. /é/) are transcribed using the precomposed characters (so here, [U+00E9](https://www.compart.com/en/unicode/U+00E9) instead of the sequence [U+0065](https://www.compart.com/en/unicode/U+0065) [U+0301](https://www.compart.com/en/unicode/U+0301)). The module...

enhancement
good first issue

Though Lithuanian is generally said to have a relatively shallow orthography, there are some apparent inconsistencies in how _ie_ is transcribed, as well as issues in the use of dental...

language support

As of at least #509 the custom selector for Latin has been broken. Latin has a [custom selector](src/wikipron/extract/lat.py) because the headwords lack macrons. Now the Romans of course didn't use...

bug
language support
hacktoberfest

The last big scrape was completed in March 2022. This is a tracking bug for a fall 2023 big scrape, which I am assigning to myself. Modulo issues discussed in...

enhancement

We have no effective documentation for the [covering grammars](https://github.com/CUNY-CL/wikipron/tree/master/data/covering_grammar) data library. * We should probably add a short description to the [data README](https://github.com/CUNY-CL/wikipron/blob/master/data/README.md). * We should give the exact instructions...

documentation
good first issue

The commandline lets the user choose to apply casefolding so that entries like `English` can be changed to either `English` or `english`. But for the scraped data on the repo,...

enhancement
good first issue