deduplicate and order suppression tokens in apply not add

Open rjames-0 opened this issue 1 month ago • 0 comments

This PR implements an optimization to prevent an increasing overhead when running faster-whisper with large batches and token suppression enabled. Fixes issue mentioned in https://github.com/OpenNMT/CTranslate2/issues/1566

The suppress token list ordering and deduplication is no longer done on every add call (which scaled badly when batching) but instead once just at apply before launching the cuda kernel.

results from a local machine test with and without optimization:

Oct 31 '25 12:10 rjames-0