text-generation-inference Allow client to provide prompt token ids instead of a string

Allow client to provide prompt token ids instead of a string

Open tju01 opened this issue 1 year ago • 7 comments

Feature request

Please add the option to provide the prompt as a list of token ids instead of a string in the text_generation.Client.generate method.

Motivation

The official implementation of the recent LLaMA-2-Chat models uses the tokenizer for every turn of a chat conversation in isolation before concatenating the token ids. I would also like to do this in my implementation. However, it's not possible to send the resulting list of tokens to TGI because it only accepts a string as a prompt.

I understand that in this specific case it might be possible to concatenate strings in the client including the string representations of the special tokens. But sometimes tokenize(a + b) != tokenize(a) + tokenize(b), so this might not be completely correct. And I have also already seen at least one other model where this is mentioned explicitly and where the authors advice to concatenate token ids instead of strings (though I unfortunately can't find that model right now).

Providing the list of token ids instead of a string is also already supported in both huggingface transformers as well as in vLLM.

Your contribution

I'm unfortunately not familiar enough with Rust to provide a PR for this feature.

Jul 19 '23 14:07 tju01

There are definitely some benefits in doing this.

1- We don' t have to guess how the model processes our string input, also we can override tokenization a produce a list of ids which is impossible to create by raw tokenization (this has applications with regard of bypassing special ids for instance) 2- Tokenization can happen on the client

There are cons:

It adds most likely a lot of complexity to the current codebase
Users do have to make tokenization on the client, can send invalid ids which we now have to validate for.

I'm really worried about the added surface for this feature, so I would tend to refuse making such a change, unless we could have a very simple way of doing it. @OlivierDehaene Wdyt ?

Jul 20 '23 07:07 Narsil

I agree that this would be nice and a tokenization refactor is long overdue. We will think about it in q4.

Jul 21 '23 08:07 OlivierDehaene

My other feature request is fairly controversial, but this would at least make it possible to tokenize on the client first, without a lossy round-trip in between. It's q4 now, so curious about what the status of this is atm

Oct 19 '23 14:10 LoganDark

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Apr 26 '24 01:04 github-actions[bot]

This issue will never be stale unless the feature is added. I don't see why this bot is necessary - there's a subset of issues that may expire over time but this one is already pretty much complete in terms of needed info.

Apr 26 '24 01:04 LoganDark

text-generation-inference text-generation-inference copied to clipboard

Allow client to provide prompt token ids instead of a string

Feature request

Motivation

Your contribution

text-generation-inference
text-generation-inference copied to clipboard