CTranslate2 icon indicating copy to clipboard operation
CTranslate2 copied to clipboard

target_prefix latency

Open SimonBenhamou opened this issue 2 months ago • 2 comments

Hello,

I noticed that when supplying a target_prefix to the translate_batch or generate_tokens method, the latencies for generating the supplied tokens is equivalent to the situation where they are not provided, while I would expect negligible latency because those tokens don't require any generation steps. I'm expecting the first step to be the generation of the token after the prefix tokens.

Am I missing something, or is this due to an inefficiency in ctranslate2's generation logic ?

Thanks, Simon

SimonBenhamou avatar Apr 30 '24 16:04 SimonBenhamou