whisper-rs
whisper-rs copied to clipboard
Fix set_grammar to set pointer to array of pointers
The signature for set_grammar is incorrect - it currently takes Option<Vec<WhisperGrammarElement>>
, but whisper_full_params
requires
const whisper_grammar_element ** grammar_rules;
i.e. a pointer to an array of grammar rules.
I updated the signature to take Option<Vec<Vec<WhisperGrammarElement>>>
, which is then transformed into Vec<Box<[whisper_grammar_element]>>
. I then added a member to store the pointers to the individual grammar element slices, and store the pointer to that member in grammar_rules.
I think this should be OK if the self.grammar and self.grammar_pointers are never modified without modifying grammar_rules and n_grammar_rules at the same time. The underlying storage should be freed when FullParams is dropped.
I don't know that this is the best function signature since it requires cloning the grammar, so I'm open to suggestions.
I tested this out and it does appear to work, though I don't know how to figure out if a rule has been fully matched.
I've updated set_grammar to take a single Box<[u64]> with the following contents:
0: # of rules 1..N+1: index of rule N N+2..: grammar elements
This allows us to pass around a single self-contained item that doesn't have any self-referential pointers. It's also simple enough so that we don't need dependencies between whisper-rs and the grammar parser, and it can also be used in llama.cpp derivatives.