panphon
panphon copied to clipboard
long i closer in phoneme feature space to long o than it is to long e?
I'm attempting to compensate for the poor handling of proper names by Google's speech transcription. For instance it transcribes "steen" as "stein", resulting in failed directory lookups. So, a compensation is to use phonemizations of surnames and then find the minimum dogol_prime_distance. In this case I run into the following problem:
dogol_prime_distance('stˈiːn', 'stˈa͡ɪn') = 1 # comparing with "steen" dogol_prime_distance('stˈo͡ʊn', 'stˈa͡ɪn') = 0 # comparing with "stone"
Perhaps I'm misunderstanding how the phoneme feature vectors are supposed to operate under the Dolgopolsky equivalency classes but it seems pretty obvious that the long e phoneme is closer in feature space to long i than is long i to the long o.
PS: About an hour after originally posting this issue, Google started returning 'steen' rather than 'stein' . I wonder if their web crawler has more intelligence than people think.
@jabowery, I'll try to look into this later. It seems there is a bug. However, you should not be using Dolgopolsky equivalency classes for this application. This method is much too coarse-grained to be very informative. Have you looked into the feature edit distance methods?