LoganDark

Results 974 comments of LoganDark

> this led me to find out that the inference time is proportional to the input length but why is it? because transformers like GPT-Neo and GPT-J have to run...

With my RWKV sequence mode implementation, it's actually fastest with only one thread, because ggml's atomic polling or work stealing or whatever is so damn expensive that it becomes a...

I feel like the real solution to this problem would be #251

> The creation of a new graph with thousands of nodes and running it on every token is in the order of 100s of **microseconds** - i.e. much less than...

> > I'm on drugs right now so my ability to explain is heavily impaired, sorry lol > > That's sprite! drug is needed for this kind of brain twisted...

> It shouldn't be something that we want incentivize though, since if your graph is too big, you are much better off trying to find ways to make it smaller....

this is mostly possible if you don't mind reading the implementation of every function to figure out *exactly* what it does: https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L942-L958 https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L505-L519

> this is mostly possible if you don't mind reading the implementation of every function to figure out _exactly_ what it does: > > https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L942-L958 > > https://github.com/saharNooby/rwkv.cpp/blob/6b26e0db28b26f0fb2c73c5aa6ff490818fb1456/rwkv.cpp#L505-L519 the latest...

> Was just wondering if there was any update on this - I can also start looking into this myself well, rwkv.cpp has a new implementation if you're interested that...

nearest merge conflict nearly gave us a panic attack