havetc issues

Repositories
Issues
Comments

Results 2 issues of


                                            havetc

[FEAT] JSON constrained support

## Motivation A lot of llm API (Together AI, fireworks, Anyscale...) and other engines (vllm...) support constrained generation with a JSON schema. As outlines is already a dependency of sglang,...

Returning a per request metric for number of cached_tokens read

## Motivation Right now sglang uses an advanced radix cache system, but it is not possible to know for each request how many of the tokens were computed, or read...