dev icon indicating copy to clipboard operation
dev copied to clipboard

ə and ɜ have the exact same features

Open DanielSWolf opened this issue 3 years ago • 1 comments

In component-feature-table.csv, the segments ə (mid central vowel) and ɜ (open-mid central unrounded vowel) have the exact same features:

ə,0259,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,-,0,0,0,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0
ɜ,025C,0,-,+,-,-,-,+,+,0,+,-,-,-,-,-,0,0,-,0,0,0,+,-,-,-,-,-,-,-,+,-,-,-,0,-,-,0

I assume that's not on purpose, given that the FAQ states that "if two phonemes differ in their graphemic representation, then they should necessarily differ in their featural representation as well".

DanielSWolf avatar Mar 17 '22 06:03 DanielSWolf

@DanielSWolf -- indeed. This is a problem that we are aware of (hence the "should"). The problem is also pervasive.

library(tidyverse)
df <- read_csv(url('https://raw.githubusercontent.com/phoible/dev/master/data/phoible.csv'))
df <- df %>% select(7:48, -Allophones, -Source, -Marginal, -SegmentClass) %>% distinct()
df <- df %>% remove_rownames %>% column_to_rownames(var="Phoneme")
df <- df %>% filter(tone != "+")
df <- rownames_to_column(df, "Phoneme")
out <- df %>% group_by(tone, stress, syllabic, short, long, consonantal, sonorant, continuant, delayedRelease, approximant, tap, trill, nasal, lateral, labial, round, labiodental, coronal, anterior, distributed, strident, dorsal, high, low, front, back, tense, retractedTongueRoot, advancedTongueRoot, periodicGlottalSource, epilaryngealSource, spreadGlottis, constrictedGlottis, fortis, raisedLarynxEjective, loweredLarynxImplosive, click) %>%
  summarize(phonemes = paste0(Phoneme, collapse = ', '), count = n()) %>% ungroup()
out %>% select(phonemes, count) %>% filter(count > 1) %>% arrange(desc(count))

   phonemes                       count
   <chr>                          <int>
 1 t, t͉, t̠, t̺, t̟, d̥, t̪̺, d̺̥, t̺͉          9
 2 t̻s̻, t̪s̪, ts̪, t̪s, t̪̻s̪̻, t̟ʃ̟, ts̻, t̻s̪̻     8
 3 t̠ʃ, t̠ʃ͉, t̠͉ʃ, d̥ʒ̥, t̻ʃ̻, d̥ʒ̊, ʈ̻ʂ̻         7
 4 ts, t͉s, t̺s̺, t̟s̟, d̥z̥, d̺̥z̺̥, ts̺         7
 5 d̻z̻, d̪z̪, dz̪, d̪ʒ, d̟ʒ̟, d̪z, dz̻         7
 6 ʃ, ʃ͉, ʒ̊, s̠, s̺̠, s̻̠, ʂ̻                7

There are several reasons for this, including but probably not limited to:

  • no features for tones (that's why I filter them out above)
  • some phoneme specifications in different documents collapse the feature vectors across phonemes, e.g., ʃ vs ʒ̊
  • some clicks are difficult to specify with the current feature set
  • plain mistakes that we need to revisit
  • the feature set itself requires some updates

@drammock anything else?

We should make this clearer in the FAQ and on the FEATURES page.

bambooforest avatar Mar 21 '22 12:03 bambooforest