Documenting the notation in use.

Open Coeur opened this issue 7 years ago • 3 comments

https://en.wikipedia.org/wiki/ARPABET

Jan 17 '19 07:01 Coeur

cmudict was developed primarily for use in speech recognition. At some point it had ~50 symbols (e.g. aspirated stops like TH, DH; flaps, DX; AX/AH, and other variants. It was believed that maintaining phonetic distinctions was important. Turns out it wasn't (accousrtic modeling got better).

Jun 01 '23 12:06 Alexir

@lenzo-ka @Coeur I agree. At minimum, the comment should say something like, "CMUdict transcriptions use a modified version of ARPABET encodings."

Jul 17 '23 23:07 danmartinez

Updated the PR, taking into account the review.

Note: please just apply your desired improvements, no need to wait years for the original author, ah ah.

Jul 22 '23 16:07 Coeur