bumblebee
bumblebee copied to clipboard
WIP: Constrained sampling based on EBNF grammars
Matches the LlamaCPP behavior. I finished the EBNF parser which encodes the grammar in the same way as the implementation from: https://github.com/huggingface/transformers/pull/27557
Unfortunately I think we may have to refactor or redesign much of the logic for processing acceptances in a way that is compatible with XLA. I know we can mimic a stack like data-structure (which I started to do), and I believe we can mimic the trie as well as containers. The issue I'm having is how possible it is to implement something like https://github.com/huggingface/transformers/pull/27557/files#diff-b7135bf8eda80faf271e4c9588eae893ebad019d2508df2f0afbe5b7ad5bbf4eR389 in a non-recursive way. Unless my understanding is incorrect and the actual incremental grammar acceptance process is the fixed depending on grammar and we can "compile" the acceptance up front