dev icon indicating copy to clipboard operation
dev copied to clipboard

Feature vectors for allophones that aren't phonemes

Open lggruspe opened this issue 1 year ago • 4 comments

Some segments appear in the PHOIBLE data as allophones, but not as phonemes in any language.

Examples:

  • tʃː is an allophone for t̠ʃ in kuna1268
  • is an allophone for t̠ʃ in yuch1247
  • tʂʼ is an allophone for t̠ʃʼ in yuch1247

phoible.csv doesn't seem to have feature vectors for these allophones.

lggruspe avatar Apr 18 '23 07:04 lggruspe

That's correct. We have a student working on this right now. But we're not sure yet how to provide them; they can't be part of phoible.csv because it has one row per phoneme (not one per allophone). Can you tell us about your use case / what would be the best format from your perspective?

drammock avatar Apr 18 '23 14:04 drammock

I was only looking to compare the features of with ʈʂ. Phoible uses both symbols (possibly to represent different sounds), but Wikipedia says they represent the same sound.

lggruspe avatar Apr 19 '23 15:04 lggruspe

looks like a mistake to me; we try to enforce that affricates have place-matching between the stop part and the fricative part. Such mistakes are more likely in the allophones because they aren't run through the same validation code that the phonemes are; though as I said we have a student working on this right now so hopefully soon many of these allophone errors will get corrected.

cc @Alessioryan

drammock avatar Apr 19 '23 16:04 drammock

@drammock Would you be able to send me the validation code for the phonemes? I'd love to take a look at this issue, I hadn't noticed it prior.

Alessioryan avatar Jun 28 '23 00:06 Alessioryan