Questions about the definition of "segment"
Hello, thank you for your awesome work! I have several questions about "segment".
-
How do you define segment, or how do you distinguish "segment" from "consonant clusters"? For example, I saw segments like [mb] and ŋmɡb in the dataset, but common consonant combinations like [st] in English are not included. Therefore, I am quite interested in the definition of "segment".
-
Based on 1), for segments with multiple phonemes (like [ŋmɡb]), how are their features derived? What is the relationship between features of the segment as a whole, and features of each phoneme?
I look forward to your response. Thank you!
"Segment" is basically equivalent to "phoneme" for most people's purposes. As for which sequences of consonants get counted as a "segment": we follow the analysis of the documenting linguist(s). Broadly speaking, evidence that a group of consonants should be analyzed as a single phoneme (versus a sequence of 2 or more phonemes) might be:
- the existence of a minimal pair (made up example: /am.ba/ vs /am͜ba/)
- distributional evidence (e.g., it occurs in contexts in which the language's phontactics disallows consonant sequences)
- phonetic (e.g., there are reliable acoustic or articulatory differences that mark it as different from consonant sequences in the language)
Regarding features, we use what we call "contour features" for phonemes like ŋmɡb where feature values change through the segment. This is implemented as a comma-separated sequence of feature values, like nasal: +,-.