dev icon indicating copy to clipboard operation
dev copied to clipboard

stop supporting both superior and inferior devoicing ring

Open drammock opened this issue 5 years ago • 5 comments

migrated from clld/phoible#20; cc @nardog

There exist pairs of segments that represent the exact same sounds but differ in the placement of diacritics:

  • j̊ j̥
  • ŋ̊ ŋ̥
  • d̥ʒ̊ d̥ʒ̥

etc. I also see transcriptions with stacked diacritics like l̪̥, but if you allow superscript versions of canonically subscript diacritics at all, why not transcribe them like l̪̊?

And why is the devoiced diacritic the only one you use two different versions of? For example, it seems to me ɹ̪̩ is better transcribed ɹ̪̍ (again, if you allow superscript versions at all).

drammock avatar May 10 '19 19:05 drammock

Thanks for pointing this out. U+030D is not, to my knowledge, used as a synonym for U+0329, so I'm -1 on allowing things like ɹ̪̍.

I believe that early on we were more strict about only using one of the devoicing ring variants, but that seems to have slipped? Or maybe I'm just mis-remembering. In any event I'm +1 to standardize on either U+030A or U+0325. @nardog @bambooforest do you have suggestions as to which one we should pick?

@bambooforest do you have any objections to standardizing on one of them? The only thing I can think of is pretty rendering on phoible.org, and it's not obvious to me that this makes it appreciably worse.

drammock avatar May 10 '19 19:05 drammock

@drammock:

U+030D is not, to my knowledge, used as a synonym for U+0329

What makes you say that? ŋ̊ seen on the IPA chart is clearly only an exemplar, as it is preceded by "e.g." See IPA (1993). This allows for use of U+0357, U+0351, U+030D, U+0311 (U+0346 is ambiguous as it represents dentolabial rather than dental in extIPA). Obviously the tilde, hacek, etc. have other values when superior so they can't be used. The editor of IPA (1993) does make a note about "a few potential problems that may need to be addressed in a future vote", but all they have done AFAIK is changed the wording from "Diacritics may be placed above..." to "Some diacritics..." in the 2015 chart.

So my suggestion would be: either use the superior versions of all diacritics for which they are available (i.e. voiceless, more/less rounded, syllabic & non-syllabic) when the canonical, inferior version of a diacritic interferes with a descender or another inferior diacritic, or don't use them at all, including the overring. If former, which letters have descenders must be explicitly defined (e.g. does ǀ have a descender?).

I've also noticed PHOIBLE doesn't use ˒ ˓ ˖ ˗ ˔ ˕. At least ˔ ˕ are officially sanctioned by the IPA Handbook, Appendix 2. I suggest, if you use the superior ring at all, you use at least ˔ ˕, if not all ˒ ˓ ˖ ˗ ˔ ˕, as well, alongside ◌͗ ◌͑ ◌̍ ◌̑, when a canonical diacritic interferes with a descender or another diacritic.

Overall, I err on the side of not using the superior variants at all. That'll realize one-to-one correspondence between symbol and value, which saves both users and compilers the hassle of figuring out which version of a diacritic is/should be used at a particular position.

nardog avatar May 10 '19 20:05 nardog

What makes you say that? ŋ̊ seen on the IPA chart is clearly only an exemplar, as it is preceded by "e.g." See IPA (1993). This allows for use of U+0357, U+0351, U+030D, U+0311

I stand corrected on this point. Regardless, I think we're in agreement that the most sensible way forward is to reduce the number of diacritics we use, rather than including several more variants, and thus to eliminate our use of U+030A.

It is also (fortunately) the far easier path; a simple find-replace and a couple lines of testing code ought to do it. Supporting all superior diacritics would not be too hard, but systematically changing the diacritics on all descenders would be... let's say "an annoying task". Supporting the modifier letter versions of ˒ ˓ ˖ ˗ ˔ ˕ would be a nightmare, because of how we assign features based on glyph type (we distinguish between diacritics and modifier letters in the feature assignment code).

drammock avatar May 10 '19 20:05 drammock

We discussed this issue in Moran & Cysouw 2018, pg 58-59:

Finally, the IPA states that “diacritics may be placed above a symbol with a descender”. For example, for marking of voiceless pronunciation of voiced segments the IPA uses the ring diacritic. Originally, the ring should be placed below the base character, like in <m̥ >, using the combining ring below at U+0325. However, in letters with long descenders the IPA also allows to put the ring above the base, like in <ŋ̊>, using the combining ring above at U+030A. Yet, proper font design does not have any problem with rendering the ring below the base character, like in <ŋ̥>, so for strict IPA encoding we propose to standardize on the ring below. As a principle, for strict IPA encoding only one option should be allowed for all diacritics.

The variable encoding as allowed by the IPA becomes even more troublesome for the tilde and diaeresis diacritics. In these cases, the IPA itself attaches differ- ent semantics to the symbols above and below a base characters. The tilde above a character (like in <ã>, using the combining tilde at U+0303) is used for nasalization, while the tilde below a character (like in <a̰>, using the combining tilde below at U+0330) indicates creaky voice. Likewise, the diaeresis above (like in <ä>, using the combining diaeresis at U+0308) is used for centralization, while the diaeresis below a character (like in ̤, using the combining diaeresis below at U+0324) indicates breathy voice. These cases strengthen our plea to not allow diacritics to switch position for typographic convenience.

The superior and inferior rings (and the like) seem to be attributable to different conventions per source (e.g. EA uses superior) and I made a mistake in not catching them:

https://github.com/bambooforest/phoible-scripts/blob/master/segment-conventions/check-for-segment-errors.md

This should set up a path towards some tests.

I'm +1 to follow @drammock 's suggestion (and the so-called "strict-IPA" that we defined (pg 77)), where we use the inferior position for all elements that can use either.

@drammock -- there's a few other incorrect segment conventions, e.g. <ŋ̊ŋ>, that I will go in and fix in the input data sources (here UZ) and do a PR.

bambooforest avatar May 11 '19 09:05 bambooforest

note to self:

> phoible %>% filter(grepl("̊", Phoneme)) %>% select(Phoneme, InventoryID, Source)
   Phoneme InventoryID Source
1       ŋ̊ǀ        1379     gm
2       ŋ̊ǁ        1379     gm
3       ŋ̊ǂ        1379     gm
4       ŋ̊ǃ        1379     gm
5       ŋ̊ʘ        1379     gm
6        å        1508     gm
7        ɾ̪̊        1760     ra
8       ɾ̪̊ʰ        1760     ra
9        ɡ̊        2167     uz
10      ɡ̊ʰ        2167     uz
11       ɲ̊        2201     uz
12       ŋ̊        2201     uz
13       j̊        2242     ea
14      ɡ̊ː        2260     ea
15       ɡ̊        2272     ea
16       ɣ̊        2272     ea
...
85       ŋ̊        2614     ea
86       ɡ̊        2626     ea
87      ɡ̊ʷ        2626     ea

with the majority of the culprits being EA. TODO:

  • update EA_IPA correspondences
  • fix input in UZ and GM
  • update RA mappings
  • write tests to catch this (@drammock where should we put them?)
  • regen the data

bambooforest avatar Jul 24 '19 14:07 bambooforest