English-to-IPA
English-to-IPA copied to clipboard
Get homophones?
Hi, is there a function similar to the rhyming function but for homophones?
Never mind! Sorry for the bother. I wrote this function:
import eng_to_ipa as ipa
def get_homophones(word):
words_that_sound_the_same = []
the_way_this_word_looks = word
the_way_this_word_sounds = ipa.convert(word)
words_that_contain_that_sound = ipa.contains(the_way_this_word_sounds)
for every_word in words_that_contain_that_sound:
the_way_that_word_looks = every_word[0]
the_way_that_word_sounds = every_word[1]
if the_way_this_word_sounds == the_way_that_word_sounds:
if the_way_that_word_looks != the_way_this_word_looks:
words_that_sound_the_same.append(every_word)
return words_that_sound_the_same
get_homophones('their') #[['there', 'ðɛr'], ["they're", 'ðɛr']]
Just reopening to ask for any tips on this problem: I'm trying to create a function that'll basically give you a homophone that is made up of multiple words.
For instance, given the words: "breakthrough" give me the words "break" and "through" in order
I made a function that chunks that into:
[['b', 'reɪkθru'],
['b', 'r', 'eɪkθru'],
['br', 'e', 'ɪkθru'],
['bre', 'ɪ', 'kθru'],
['breɪ', 'k', 'θru'],
['breɪk', 'θ', 'ru'],
['breɪkθ', 'r', 'u'],
['breɪkθr', 'u']]
(This is a function I already wrote that does this.)
Then, it takes each segment, for example, br and checks for words that have exactly that sound (and no other sound) (which I can use my homophone function above to do it with).
The problem comes in dividing these arrays up into all possible ways a word could be broken up. For instance, my poor function can only do this for one character at a time but ideally you'd have something that would do it for multiple characters depending on the length of the word's ipa characters. For instance, for breakthrough, that would be 8 characters (ignoring the stress for now since I'm not sure how to deal with that). So the algorithm would then divide breakthrough up like so:
1, 7:
['b', 'reɪkθru']
['breɪkθr', 'u']
2, 6:
['br', 'eɪkθru']
['breɪkθ', 'ru']
3, 5:
['bre', 'ɪkθru']
4, 4:
['breɪ', 'kθru']
5, 3:
['breɪk', 'θru']
and so on...
But I'm posting this mostly to get any suggestions on how to go about this.
What I want to end up with is a function that I can call and get something like this:
multi_homophone('breakthrough')
[['break', 'breɪk'], ['through', 'θru']],
... (other options of combinations)
Take a look at https://github.com/Kyubyong/g2p . Basically, they tag the whole sentence and then use the fact of whether or not the word is a verb to determine which homophone to use. For others, like "bowing" and "bowing", you'd have to use context clues. That'd probably necessitate a machine learning model.