tokenizers icon indicating copy to clipboard operation
tokenizers copied to clipboard

py03 async bindings for encode/decode in rust

Open michaelfeil opened this issue 7 months ago • 1 comments

Py03 unlocks the GIL, which is great. Most LLM inference servers have many cores (>200), but are blocked by the GIL. Also, most servers are async, by nature that python thread based parallelism isn't great.

As most tokenization is looking something like this:

    def _encode(self, prompt: str) -> List[int]:
        """Encode using the rust tokenizer directly, while relaizing gil"""
        return self.tokenizer.encode(prompt, add_special_tokens=True)

    async def encode_prompt(self, prompt: str) -> List[int]:
        f len(prompt) > 2_000:
            # offload to thread to avoid blocking the event loop
            loop = asyncio.get_running_loop()
            tokenized = await loop.run_in_executor(self._threadpool, self._encode, prompt)
        else:
            tokenized = self._encode(
                prompt,
            )
        return tokenized[1:]

Proposal: Adding Py03-async-runtimes, a async runtime option for encode/decode. Its potentially worth it for every operation that takes >1ms, or every encode step.

Similar async vs sync usage: https://github.com/basetenlabs/truss/blob/0816876a474b0c4910eaa3f869ed4c685f7a7570/baseten-performance-client/src/lib.rs#L659C1-L760C20 also, e.g. sglang uses the primitive and could be directly pluged in there. https://github.com/sgl-project/sglang/blob/777688b8929c877e4e28c2eac208d776abe4c3af/python/sglang/srt/managers/tokenizer_manager.py#L454

michaelfeil avatar Jun 11 '25 23:06 michaelfeil

cc @Narsil @ArthurZucker

michaelfeil avatar Jun 11 '25 23:06 michaelfeil

Personnaly happy to have if it helps the community!

ArthurZucker avatar Jul 29 '25 13:07 ArthurZucker

Was this closed in #1843?

davidhewitt avatar Nov 30 '25 15:11 davidhewitt

Yes let’s close!

ArthurZucker avatar Nov 30 '25 18:11 ArthurZucker

@davidhewitt if you find issues with the async or sync interfaces, feel free to optimize or flag.

michaelfeil avatar Dec 01 '25 02:12 michaelfeil