text-generation-inference no prefill when decoder_input

no prefill when decoder_input_details=True from InferenceClient

Open lifeng-jin opened this issue 9 months ago • 1 comments

System Info

I used 3.0.2 official docker to load a local Llama 3 instruct model

Information

[x] Docker
[ ] The CLI directly

Tasks

[x] An officially supported command
[ ] My own modifications

Reproduction

I used 3.0.2 official docker to load a local Llama 3 instruct model, and used InferenceClient to call it (see some interaction here)

output = client.text_generation("Today is a ", max_new_tokens=2, do_sample=True, temperature=1.0, details=True, decoder_input_details=True)

The output is this. Prefill is empty.

TextGenerationOutput(generated_text='5-minute', details=TextGenerationOutputDetails(finish_reason='length', generated_tokens=2, prefill=[], tokens=[TextGenerationOutputToken(id=20, logprob=-2.1425781, special=False, text='5'), TextGenerationOutputToken(id=24401, logprob=-4.4609375, special=False, text='-minute')], best_of_sequences=None, seed=9305067545921572115, top_tokens=None))

Expected behavior

I expect prefill to include tokens in the prompt as well as their logprobs, as shown in the doc here.

Jan 30 '25 19:01 lifeng-jin

text-generation-inference text-generation-inference copied to clipboard

no prefill when decoder_input_details=True from InferenceClient

System Info

Information

Tasks

Reproduction

Expected behavior

text-generation-inference
text-generation-inference copied to clipboard