wikipron icon indicating copy to clipboard operation
wikipron copied to clipboard

[arm] optional material in parentheses

Open jhdeov opened this issue 4 years ago • 3 comments

There are Armenian entries which show optional material in parentheses. For example, any initial sibilant-stop cluster gets an obligatory schwa in Western Armenian, but an optional one in Eastern Armenian: [(ə)stɑˈnɑl].

The script seems to skip extracting parentheses, so the extracted pronuncation of ստանալ for Eastern is treated as [ə s t a n a l] instead of [(ə) s t a n a l]

jhdeov avatar Jan 10 '21 05:01 jhdeov

So "optionality in parentheses", while not an uncommon thing, is not really mentioned in the Wiktionary pronunciation specs that I see.

I don't know what WikiPron should do here. We try not to do too many hermeneutics with the pronunciations; if they're not clear, we ought to fix them upstream and rescrape. I would say this is a good case of "not clear": it will make sense to linguists of a certain stripe, and perhaps speakers of the relevant dialects, but it would be much better to just list the logical possibilities instead of adding a sort of regular language of pronunciations.

If you are looking at the static scrape in data/ it's also possible the Wiktionary data changed since we last ran that, since it seems like Armenian is under a lot of active development.

On Sun, Jan 10, 2021 at 12:43 AM Hossep Dolatian [email protected] wrote:

There are Armenian entries which show optional material in parentheses. For example, any initial sibilant-stop cluster https://en.wiktionary.org/wiki/%D5%BD%D5%BF%D5%A1%D5%B6%D5%A1%D5%AC gets an obligatory schwa in Western Armenian, but an optional one in Eastern Armenian: [(ə)stɑˈnɑl].

The script seems to skip extracting parentheses, so the extracted pronuncation of ստանալ for Eastern is treated as [ə s t a n a l] instead of [(ə) s t a n a l]

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/wikipron/issues/315, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OKNCKDKTDDN5SKLTS3SZE5BPANCNFSM4V4E7A4A .

kylebgorman avatar Jan 10 '21 22:01 kylebgorman

It's understandable that you shouldn't do much hermeneutics. And ideally it should be a thing that the Wiktionary entries themselves should fix. But I don't think it'll be changed any time soon :/

For WikiPron, I'm curious if there's a way to get a completely unfilitered extraction from an entry. So for example, when I last ran WikiPron a few days ago, the headword ստանալ [(ə)stanal] was extracted as [ə s t a n a l]. Is it possible to play with some parameter so that the extracted form is [(ə) s t a n a l] or [( ə ) s t a n a l], i.e., the script treats the parentheses as separate "IPA" symbols?

jhdeov avatar Jan 11 '21 02:01 jhdeov

Nothing exists for that yet but I suppose it could be built. The extraction itself isn’t terribly complex. The first version of the script was maybe 30 lines!

On Sun, Jan 10, 2021 at 9:41 PM Hossep Dolatian [email protected] wrote:

It's understandable that you shouldn't do much hermeneutics. And ideally it should be a thing that the Wiktionary entries themselves should fix. But I don't think it'll be changed any time soon :/

For WikiPron, I'm curious if there's a way to get a completely unfilitered extraction from an entry. So for example, when I last ran WikiPron a few days ago, the headword ստանալ [(ə)stanal] was extracted as [ə s t a n a l]. Is it possible to play with some parameter so that the extracted form is [(ə) s t a n a l] or [( ə ) s t a n a l], i.e., the script treats the parentheses as separate "IPA" symbols?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/kylebgorman/wikipron/issues/315#issuecomment-757596570, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABG4OMSBRSZAT4UCKH2Q5LSZJQNFANCNFSM4V4E7A4A .

kylebgorman avatar Jan 11 '21 02:01 kylebgorman

Hi. The readme recommends the use of the flag --no-skip-parens. I used pip (and pip3) right now to get wikipron. But the argument --no-skip-parens is not recognized

usage: wikipron [-h] [--phonetic] [--stress] [--no-stress]
                [--syllable-boundaries] [--no-syllable-boundaries]
                [--dialect DIALECT] [--casefold] [--cut-off-date CUT_OFF_DATE]
                [--segment] [--no-segment] [--skip-spaces-word]
                [--no-skip-spaces-word] [--skip-spaces-pron]
                [--no-skip-spaces-pron] [--tone] [--no-tone]
                key
wikipron: error: unrecognized arguments: --no-skip-parens

jhdeov avatar Nov 03 '22 20:11 jhdeov

It looks like this feature hasn't been released yet. If you need it now, do:

pip uninstall wikipron
pip install git+https://github.com/CUNY-CL/wikipron.git

@jacksonllee should we do a new PyPI release?

kylebgorman avatar Nov 04 '22 12:11 kylebgorman

@jacksonllee should we do a new PyPI release?

Yeah, we should. It looks like there's a couple packaging / maintenance things we should update as well. Let me put together a PR in the next couple of days to prep for a new release. Stay tuned!

jacksonllee avatar Nov 05 '22 01:11 jacksonllee

Circling back here -- The wikipron package version 1.3.0 has just been released to PyPI: https://pypi.org/project/wikipron/1.3.0/

jacksonllee avatar Nov 28 '22 19:11 jacksonllee