dev icon indicating copy to clipboard operation
dev copied to clipboard

Questions about the definition of "segment"

Open Xiaoxi-Luo-CL opened this issue 7 months ago • 1 comments

Hello, thank you for your awesome work! I have several questions about "segment".

  1. How do you define segment, or how do you distinguish "segment" from "consonant clusters"? For example, I saw segments like [mb] and ŋmɡb in the dataset, but common consonant combinations like [st] in English are not included. Therefore, I am quite interested in the definition of "segment".

  2. Based on 1), for segments with multiple phonemes (like [ŋmɡb]), how are their features derived? What is the relationship between features of the segment as a whole, and features of each phoneme?

I look forward to your response. Thank you!

Xiaoxi-Luo-CL avatar Jun 08 '25 15:06 Xiaoxi-Luo-CL

"Segment" is basically equivalent to "phoneme" for most people's purposes. As for which sequences of consonants get counted as a "segment": we follow the analysis of the documenting linguist(s). Broadly speaking, evidence that a group of consonants should be analyzed as a single phoneme (versus a sequence of 2 or more phonemes) might be:

  • the existence of a minimal pair (made up example: /am.ba/ vs /am͜ba/)
  • distributional evidence (e.g., it occurs in contexts in which the language's phontactics disallows consonant sequences)
  • phonetic (e.g., there are reliable acoustic or articulatory differences that mark it as different from consonant sequences in the language)

Regarding features, we use what we call "contour features" for phonemes like ŋmɡb where feature values change through the segment. This is implemented as a comma-separated sequence of feature values, like nasal: +,-.

drammock avatar Jun 10 '25 18:06 drammock