Kyle Gorman
Kyle Gorman
I was wondering if you could provide any information about where the names data was obtained and what if any license it's under.
Currently we are using the ISO-639-2 "bibliographic" codes ("ger" for German). It seems to me that these are not terribly widely used and make compatibility with other multilingual resources poorer...
* A script for computing KPI numbers (languages, dialects, scripts, and number of prons) should be incorporated into the big scrape workflow. @kylebgorman's draft is [here](https://gist.github.com/kylebgorman/a32fd2c91c862cd508de9b14fbba80dd). * @jacksonllee proposes that...
Persian, nonstandardly, [uses ~ to separate variants](https://github.com/kylebgorman/wikipron/blob/master/data/tsv/per_phonemic.tsv#L273). Fix these upstream, and then rescrape.
Several languages use ZERO WIDTH SPACE and ZERO WIDTH NON JOINER, which, as the name suggests, aren't real characters. Let's look into why and see whether that's a bug upstream...
As [reported here](https://github.com/sigmorphon/2020/issues/9) there are some inconsistencies with /l/ and the dental stops. As [discussed here](https://en.wiktionary.org/wiki/Wiktionary:Information_desk/2020/April#Performing_bulk_edits, there is a pronunciation module and pron template for Bulgarian on Wiktionary; we may...
Though Lithuanian is generally said to have a relatively shallow orthography, there are some apparent inconsistencies in how _ie_ is transcribed, as well as issues in the use of dental...
### Description & Motivation The argument to the `Trainer`s min_time is expected to be a string of the form /(\d\d):(\d\d):(\d\d):(\d\d)/. One way in which parsing can fail (simple example: `--min_time...
As of at least #509 the custom selector for Latin has been broken. Latin has a [custom selector](src/wikipron/extract/lat.py) because the headwords lack macrons. Now the Romans of course didn't use...
The last big scrape was completed in March 2022. This is a tracking bug for a fall 2023 big scrape, which I am assigning to myself. Modulo issues discussed in...