ART icon indicating copy to clipboard operation
ART copied to clipboard

Add Arctic Inference Suffix Decoding

Open bradhilton opened this issue 8 months ago • 1 comments

Arctic Inference’s Suffix Decoding (AISD) is a speculative-decoding variant that caches repeating suffixes and bulk-verifies them, shaving 2×-6× off raw decoding time and delivering roughly 2×–4× end-to-end speed-ups in vLLM-based workloads ([snowflake.com]1, [snowflake.com]2).

Because ART's wall time is often dominated by inference, adding Arctic Inference support could potentially speed up total training time dramatically.

bradhilton avatar Jun 30 '25 23:06 bradhilton