wikipron
wikipron copied to clipboard
[arm] optional material in parentheses
There are Armenian entries which show optional material in parentheses. For example, any initial sibilant-stop cluster gets an obligatory schwa in Western Armenian, but an optional one in Eastern Armenian: [(ə)stɑˈnɑl].
The script seems to skip extracting parentheses, so the extracted pronuncation of ստանալ for Eastern is treated as [ə s t a n a l] instead of [(ə) s t a n a l]
So "optionality in parentheses", while not an uncommon thing, is not really mentioned in the Wiktionary pronunciation specs that I see.
I don't know what WikiPron should do here. We try not to do too many hermeneutics with the pronunciations; if they're not clear, we ought to fix them upstream and rescrape. I would say this is a good case of "not clear": it will make sense to linguists of a certain stripe, and perhaps speakers of the relevant dialects, but it would be much better to just list the logical possibilities instead of adding a sort of regular language of pronunciations.
If you are looking at the static scrape in data/ it's also possible the
Wiktionary data changed since we last ran that, since it seems like
Armenian is under a lot of active development.
On Sun, Jan 10, 2021 at 12:43 AM Hossep Dolatian [email protected] wrote:
There are Armenian entries which show optional material in parentheses. For example, any initial sibilant-stop cluster https://en.wiktionary.org/wiki/%D5%BD%D5%BF%D5%A1%D5%B6%D5%A1%D5%AC gets an obligatory schwa in Western Armenian, but an optional one in Eastern Armenian: [(ə)stɑˈnɑl].
The script seems to skip extracting parentheses, so the extracted pronuncation of ստանալ for Eastern is treated as [ə s t a n a l] instead of [(ə) s t a n a l]
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/wikipron/issues/315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OKNCKDKTDDN5SKLTS3SZE5BPANCNFSM4V4E7A4A .
It's understandable that you shouldn't do much hermeneutics. And ideally it should be a thing that the Wiktionary entries themselves should fix. But I don't think it'll be changed any time soon :/
For WikiPron, I'm curious if there's a way to get a completely unfilitered extraction from an entry. So for example, when I last ran WikiPron a few days ago, the headword ստանալ [(ə)stanal] was extracted as [ə s t a n a l]. Is it possible to play with some parameter so that the extracted form is [(ə) s t a n a l] or [( ə ) s t a n a l], i.e., the script treats the parentheses as separate "IPA" symbols?
Nothing exists for that yet but I suppose it could be built. The extraction itself isn’t terribly complex. The first version of the script was maybe 30 lines!
On Sun, Jan 10, 2021 at 9:41 PM Hossep Dolatian [email protected] wrote:
It's understandable that you shouldn't do much hermeneutics. And ideally it should be a thing that the Wiktionary entries themselves should fix. But I don't think it'll be changed any time soon :/
For WikiPron, I'm curious if there's a way to get a completely unfilitered extraction from an entry. So for example, when I last ran WikiPron a few days ago, the headword ստանալ [(ə)stanal] was extracted as [ə s t a n a l]. Is it possible to play with some parameter so that the extracted form is [(ə) s t a n a l] or [( ə ) s t a n a l], i.e., the script treats the parentheses as separate "IPA" symbols?
— You are receiving this because you commented.
Reply to this email directly, view it on GitHub https://github.com/kylebgorman/wikipron/issues/315#issuecomment-757596570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OMSBRSZAT4UCKH2Q5LSZJQNFANCNFSM4V4E7A4A .
Hi. The readme recommends the use of the flag --no-skip-parens. I used pip (and pip3) right now to get wikipron. But the argument --no-skip-parens is not recognized
usage: wikipron [-h] [--phonetic] [--stress] [--no-stress]
[--syllable-boundaries] [--no-syllable-boundaries]
[--dialect DIALECT] [--casefold] [--cut-off-date CUT_OFF_DATE]
[--segment] [--no-segment] [--skip-spaces-word]
[--no-skip-spaces-word] [--skip-spaces-pron]
[--no-skip-spaces-pron] [--tone] [--no-tone]
key
wikipron: error: unrecognized arguments: --no-skip-parens
It looks like this feature hasn't been released yet. If you need it now, do:
pip uninstall wikipron
pip install git+https://github.com/CUNY-CL/wikipron.git
@jacksonllee should we do a new PyPI release?
@jacksonllee should we do a new PyPI release?
Yeah, we should. It looks like there's a couple packaging / maintenance things we should update as well. Let me put together a PR in the next couple of days to prep for a new release. Stay tuned!
Circling back here -- The wikipron package version 1.3.0 has just been released to PyPI: https://pypi.org/project/wikipron/1.3.0/