CTranslate2
CTranslate2 copied to clipboard
deduplicate and order suppression tokens in apply not add
This PR implements an optimization to prevent an increasing overhead when running faster-whisper with large batches and token suppression enabled. Fixes issue mentioned in https://github.com/OpenNMT/CTranslate2/issues/1566
The suppress token list ordering and deduplication is no longer done on every add call (which scaled badly when batching) but instead once just at apply before launching the cuda kernel.
results from a local machine test with and without optimization: