syncode icon indicating copy to clipboard operation
syncode copied to clipboard

Efficient and general syntactical decoding for Large Language Models

Results 23 syncode issues
Sort by recently updated
recently updated
newest added

To make SynCode work on serving LLMs

If we are parsing multiple partial codes the parsers currently run sequentially in a single thread, this should be much faster parallel in multiple threads. [Example](https://github.com/microsoft/monitors4codegen/blame/022c65efb19cf6046d0b67960ca232ae7a351af4/src/monitors4codegen/monitor_guided_decoding/hf_gen.py#L62) [Options](https://stackoverflow.com/questions/27435284/multiprocessing-vs-multithreading-vs-asyncio)

When we are fast enough with LR(1) parsing, we should add support for OpenAI models as in [this](https://github.com/microsoft/monitors4codegen)

First of all, thank you for this great work! I would like to know if Syncode can also work with fill-in-the-middle models? If yes, how?

Currently, when trying to run constrained decoding with a new grammar, we are prompted with ``` Creating DFA mask store for LlamaTokenizerFast and custom, may take more than 10 minutes....

In the following example, the constrained model fails to generate a string that matches the specified grammar: ```python import traceback import lark import torch from transformers import AutoModelForCausalLM, AutoTokenizer from...