LLaDA icon indicating copy to clipboard operation
LLaDA copied to clipboard

Inference Speed

Open jacklishufan opened this issue 8 months ago • 3 comments

Hi. Thanks for the great work. Can the author provides some insight on the inference speed of Llada versus an 8B autoregressive LM? It appears Llada is slower in some of my initial tests, which could be due to the fact that it cannot leverage KV-cache since attention mask is non-causal. However, one of the alleged advantage of diffusion-LM is that they are supposed to be faster?

jacklishufan avatar Mar 13 '25 19:03 jacklishufan

I'm observing extremely slow inference speeds when running few-shot evaluation on gsm8k using EleutherAI/lm-evaluation-harness. On an A100, a single instance is taking around 20 minutes! The default is to use 5-shot (5 examples in context). Is this expected?

dhruvdcoder avatar May 16 '25 02:05 dhruvdcoder

I'm observing extremely slow inference speeds when running few-shot evaluation on gsm8k using EleutherAI/lm-evaluation-harness. On an A100, a single instance is taking around 20 minutes! The default is to use 5-shot (5 examples in context). Is this expected?

Hi, I have encountered the same question. I try to evaluate LLaDA-Instruct using gsm8k on a single A100, but it is expected to take around 14 hours, which is quite slow.

bioLydia avatar Jul 03 '25 08:07 bioLydia