whisper-rs icon indicating copy to clipboard operation
whisper-rs copied to clipboard

Fix set_grammar to set pointer to array of pointers

Open jcsoo opened this issue 9 months ago • 1 comments

The signature for set_grammar is incorrect - it currently takes Option<Vec<WhisperGrammarElement>>, but whisper_full_params requires

const whisper_grammar_element ** grammar_rules;

i.e. a pointer to an array of grammar rules.

I updated the signature to take Option<Vec<Vec<WhisperGrammarElement>>>, which is then transformed into Vec<Box<[whisper_grammar_element]>>. I then added a member to store the pointers to the individual grammar element slices, and store the pointer to that member in grammar_rules.

I think this should be OK if the self.grammar and self.grammar_pointers are never modified without modifying grammar_rules and n_grammar_rules at the same time. The underlying storage should be freed when FullParams is dropped.

I don't know that this is the best function signature since it requires cloning the grammar, so I'm open to suggestions.

I tested this out and it does appear to work, though I don't know how to figure out if a rule has been fully matched.

jcsoo avatar May 01 '24 22:05 jcsoo

I've updated set_grammar to take a single Box<[u64]> with the following contents:

0: # of rules 1..N+1: index of rule N N+2..: grammar elements

This allows us to pass around a single self-contained item that doesn't have any self-referential pointers. It's also simple enough so that we don't need dependencies between whisper-rs and the grammar parser, and it can also be used in llama.cpp derivatives.

jcsoo avatar May 02 '24 16:05 jcsoo