Caster
Caster copied to clipboard
Managing Vocabularly Programmatically
Is your feature request related to a problem? Please describe. It's nice to be able to have all your custom vocab in a file rather than using dragon's vocab editor. Dragon vocabs can get deleted easily and are slow to change at scale. d
Describe the solution you'd like
(Ideally a single solution that works across engines but at least for dragon). I think caster should have a file with 1) a dictionary of the form pronunciation: written-form
for adding custom words. 2) a list that removes all of the words in the list from the vocab. 3) Possibly also a replace-dictionary that catches commonly missinterpreted words (i.e. catches the words they are misinterpreted as) and replaces them with the desired interpretation.
I don't know the best way to do this. I recall @lexxish saying he could do this using natlink. @quintijn may have ideas. I'm not sure if changes would need to be made at the dragonfly level @Danesprite but I definitely think there should be files set up in caster with e.g. the dictionary already there so that users can just start adding words out of the box.
Dragon vocab has some settings for its words like whether the word should be capitalized if it is the first sentence and stuff. I don't think that's very important, but could potentially be handled; though I think dragon has made it hard to access this stuff (possibly relevent--but I don't recommend spending time on this: https://github.com/dictation-toolbox/dragonfly/pull/111)
I think something like this is needed especially as speech recognition backends for dragonfly diversify. I don't mind it starting off in Caster but ultimately the logic will belong in the dragonfly repository/or as a new project. To which a GUI or other interfaces could be built upon for projects and utilizing the vocabulary configuration files.
I have been working some on this for the Kaldi backend, as part of trying to improve its dictation capabilities. But you're right, it really should be generalized to work with all the backends, to avoid duplication of effort and to ease the using of different backends. I agree that at least the API should be located in dragonfly. A GUI interface could then be an optional component of dragonfly, or in a separate package.
For natlink the functions to add/remove words are below (thanks to @quintijn). I agree the implementation should be consistent and handle multiple engines.
def deleteWordIfNecessary(w):
if not w:
return None
isInActiveVoc = (natlink.getWordInfo(w, 0) != None)
if isInActiveVoc:
natlink.deleteWord(w)
# TODO add unicode support
recharspace = re.compile("^[a-zA-Z-\\\/ ]+$")
def add_word(w):
w = w.strip()
if not w: return
if not recharspace.match(w):
print 'invalid character in word to add: %s'% w
return
isInVoc = (natlink.getWordInfo(w,1) != None)
isInActiveVoc = (natlink.getWordInfo(w,0) != None)
if isInActiveVoc:
return
try:
if isInVoc: # from backup vocabulary:
print 'make backup word active:', w
natlink.addWord(w,0)
#add2logfile(w, 'activated words.txt')
else:
print 'adding word ', w
natlink.addWord(w)
#add2logfile(w, 'new words.txt')
except natlink.InvalidWord:
print 'not added to vocabulary, invalid word: %s'% w
I currently load a list of words from the home directory. The format is "word,pronunciation\n"
def vocab_mapping():
add_words_file = expanduser("~") + '/dragon/addWords.csv'
with open(add_words_file) as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if len(row) == 1:
add_word(row[0])
elif len(row) == 2:
add_word(row[0] + "\\\\" + row[1])
else:
raise get_error("addWords.csv", row)
remove_words_file = expanduser("~") + '/dragon/removeWords.csv'
with open(remove_words_file) as csvfile:
reader = csv.reader(csvfile)
for row in reader:
if len(row) == 1:
deleteWordIfNecessary(row[0])
else:
raise get_error("removeWords.csv", row)
return {
"edit add words": F(launch_file, file=add_words_file),
"edit remove words": F(launch_file, file=remove_words_file)
}