text-generation-inference
                                
                                 text-generation-inference copied to clipboard
                                
                                    text-generation-inference copied to clipboard
                            
                            
                            
                        no prefill when decoder_input_details=True from InferenceClient
System Info
I used 3.0.2 official docker to load a local Llama 3 instruct model
Information
- [x] Docker
- [ ] The CLI directly
Tasks
- [x] An officially supported command
- [ ] My own modifications
Reproduction
I used 3.0.2 official docker to load a local Llama 3 instruct model, and used InferenceClient to call it (see some interaction here)
output = client.text_generation("Today is a ", max_new_tokens=2, do_sample=True, temperature=1.0, details=True, decoder_input_details=True)
The output is this. Prefill is empty.
TextGenerationOutput(generated_text='5-minute', details=TextGenerationOutputDetails(finish_reason='length', generated_tokens=2, prefill=[], tokens=[TextGenerationOutputToken(id=20, logprob=-2.1425781, special=False, text='5'), TextGenerationOutputToken(id=24401, logprob=-4.4609375, special=False, text='-minute')], best_of_sequences=None, seed=9305067545921572115, top_tokens=None))
Expected behavior
I expect prefill to include tokens in the prompt as well as their logprobs, as shown in the doc here.