Kimi-K2
Kimi-K2 copied to clipboard
Question: why chunk-wise autoregressive generation?
I don't quite understand why chunk-wise autoregressive generation can solve the length limitation issues of LLMs.
In Figure 4, each input to the rewrite model contains the full input excerpt (i.e., the original text). How is this different from directly feeding the full input excerpt into the rewrite model?
It is to avoid information loss. Imagine by chunking you enforced the model focusing on every part of the full input, making sure that every piece of information is considered during rewriting.