Milo Q

Results 3 comments of Milo Q

个人理解,相当于把prefusion模块当成 LLM的前几层,LLM本身当成LLM的后面的层,所以 LLM在后面本身接收到的信息少了也没事

@MonolithFoundation How to explain the speed gain mentioned in these papers, such as Flops ?