panphon icon indicating copy to clipboard operation
panphon copied to clipboard

long i closer in phoneme feature space to long o than it is to long e?

Open jabowery opened this issue 3 years ago • 1 comments

I'm attempting to compensate for the poor handling of proper names by Google's speech transcription. For instance it transcribes "steen" as "stein", resulting in failed directory lookups. So, a compensation is to use phonemizations of surnames and then find the minimum dogol_prime_distance. In this case I run into the following problem:

dogol_prime_distance('stˈiːn', 'stˈa͡ɪn') = 1 # comparing with "steen" dogol_prime_distance('stˈo͡ʊn', 'stˈa͡ɪn') = 0 # comparing with "stone"

Perhaps I'm misunderstanding how the phoneme feature vectors are supposed to operate under the Dolgopolsky equivalency classes but it seems pretty obvious that the long e phoneme is closer in feature space to long i than is long i to the long o.

PS: About an hour after originally posting this issue, Google started returning 'steen' rather than 'stein' . I wonder if their web crawler has more intelligence than people think.

jabowery avatar Sep 14 '21 21:09 jabowery

@jabowery, I'll try to look into this later. It seems there is a bug. However, you should not be using Dolgopolsky equivalency classes for this application. This method is much too coarse-grained to be very informative. Have you looked into the feature edit distance methods?

dmort27 avatar Sep 15 '21 12:09 dmort27