ChatRWKV
ChatRWKV copied to clipboard
No time_shift use in ChatRWKV?
Hi,
Why is time_shift not applied in ChatRWKV on x before computing x * self.time_mix_k + xx * (1 - self.time_mix_k) while in RWKV V4, it is the case. Any idea ?
state[5*i+1] is the x in previous iteration, see here. It's exactly what time_shift do.