prefilling for the chunk in inference

Open zyy-fc opened this issue 8 months ago • 1 comments

您好，认真读了您的论文，有个问题想请教一下：

在Section 3.2.2的末尾您写道： “We then apply additional prefilling for this chunk with M and a timestep of 0.999 to generate the kv-cache for efficient inference”

我看了一下推理代码，这步应该是对应ode_wrapper，我不是很能理解这步的主要目的是为了什么？为什么要通过这步来计算kv-cache呢？

还望赐教！谢谢

Apr 09 '25 10:04 zyy-fc

Good question. Note the detokenizer is trained to denoise the current chunk using the previous clean chunks as the context. This additional prefilling basically updates that context by adding the newly cleaned chunk back in. This keeps the context ready for when we process the next chunk.

Apr 29 '25 06:04 jzq2000