ggml-js
ggml-js copied to clipboard
RWKV-LM with ggml-js
Is it possible to use RWKV-LM https://github.com/BlinkDL/RWKV-LM with ggml-js and Bunjs https://bun.sh/ Is there any example available?
Not yet, the problem is that there are no tokenizers in JS ecosystem yet. There are node.js bindings for huggingface tokenizers, but was not able to install them. So I don't know.
I hope I won't need to re-implement another BPE tokenizer just to run the example, but I definitely want to run RWKV from node. And if it runs in node, it should also run in Bun, because it's just N-API wrapper + a little bit of plain JS (no build step, etc).
BTW: the example is here https://github.com/cztomsik/ggml-js/blob/main/examples/rwkv.js I'm not sure if it actually works correctly but I will figure out eventually :)
updated the example, now it can generate some text
https://user-images.githubusercontent.com/3526922/235147373-1d3b7205-8c4c-4654-940c-78a4baeb4fad.mov
The example code is too difficult to understand for developers new to ML and LLM. Is it possible to created an abstraction/lib to make it easy to use?
Yeah, it's just PoC :) do you have any specific API in mind? What are you trying to do?
BTW: bun does not work currently - it seems that napi_set_instance_data() is not implemented/exported in bun currently.
https://github.com/oven-sh/bun/issues/158#issuecomment-1528890831
I want to use RWKV with langchainjs https://js.langchain.com/docs/ so an adapter for langchainjs would be great.
I see, but that's unlikely, at least not in the short term. I can move the RWKV model from examples to the main package, but there is still a lot of functionality missing, to be practically useful (top_k/top_p sampling, repetition-penalty, fixing the mmap vs. no_alloc, async, etc.), and I will definitely rather focus on those.
Sorry. But you should be able to easily create your own package, and use this lib as dependency.
JS tokenizer here https://github.com/josephrocca/rwkv-v4-web
@BlinkDL thanks, but I couldn't get it working. But I did a quick and dirty impl here, it seems to work. 🤷 https://github.com/cztomsik/ggml-js/blob/main/lib/tokenizer.js#L63
@cztomsik Try my unit tests: bottom of https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py
It's broken, thanks 😆
Tokenizer fixed, mmap/no_alloc fixed too (it can now load 1B raven model without having to copy the weights first)
The next one is sampling, stopping at end-token and async (I'm not really sure how that will map to ggml)
RWKV example now works with f16 matrices. Run python rwkv-convert.py <model> --mtype f16 to generate smaller file.
For example, this is 3B Raven model.
https://user-images.githubusercontent.com/3526922/236181718-30adc57a-d571-4ae4-9ab8-12f73147af7c.mov
Q4 and Q8 should work too but it's not supported in the conversion script yet.
As you can see the bigger problem right now is the sampling.