ART
ART copied to clipboard
Add Arctic Inference Suffix Decoding
Arctic Inference’s Suffix Decoding (AISD) is a speculative-decoding variant that caches repeating suffixes and bulk-verifies them, shaving 2×-6× off raw decoding time and delivering roughly 2×–4× end-to-end speed-ups in vLLM-based workloads ([snowflake.com]1, [snowflake.com]2).
Because ART's wall time is often dominated by inference, adding Arctic Inference support could potentially speed up total training time dramatically.