ggml-js icon indicating copy to clipboard operation
ggml-js copied to clipboard

RWKV-LM with ggml-js

Open ansarizafar opened this issue 2 years ago • 13 comments

Is it possible to use RWKV-LM https://github.com/BlinkDL/RWKV-LM with ggml-js and Bunjs https://bun.sh/ Is there any example available?

ansarizafar avatar Apr 28 '23 04:04 ansarizafar

Not yet, the problem is that there are no tokenizers in JS ecosystem yet. There are node.js bindings for huggingface tokenizers, but was not able to install them. So I don't know.

I hope I won't need to re-implement another BPE tokenizer just to run the example, but I definitely want to run RWKV from node. And if it runs in node, it should also run in Bun, because it's just N-API wrapper + a little bit of plain JS (no build step, etc).

cztomsik avatar Apr 28 '23 07:04 cztomsik

BTW: the example is here https://github.com/cztomsik/ggml-js/blob/main/examples/rwkv.js I'm not sure if it actually works correctly but I will figure out eventually :)

cztomsik avatar Apr 28 '23 08:04 cztomsik

updated the example, now it can generate some text

https://user-images.githubusercontent.com/3526922/235147373-1d3b7205-8c4c-4654-940c-78a4baeb4fad.mov

cztomsik avatar Apr 28 '23 12:04 cztomsik

The example code is too difficult to understand for developers new to ML and LLM. Is it possible to created an abstraction/lib to make it easy to use?

ansarizafar avatar Apr 29 '23 05:04 ansarizafar

Yeah, it's just PoC :) do you have any specific API in mind? What are you trying to do?

BTW: bun does not work currently - it seems that napi_set_instance_data() is not implemented/exported in bun currently. https://github.com/oven-sh/bun/issues/158#issuecomment-1528890831

cztomsik avatar Apr 29 '23 22:04 cztomsik

I want to use RWKV with langchainjs https://js.langchain.com/docs/ so an adapter for langchainjs would be great.

ansarizafar avatar Apr 30 '23 05:04 ansarizafar

I see, but that's unlikely, at least not in the short term. I can move the RWKV model from examples to the main package, but there is still a lot of functionality missing, to be practically useful (top_k/top_p sampling, repetition-penalty, fixing the mmap vs. no_alloc, async, etc.), and I will definitely rather focus on those.

Sorry. But you should be able to easily create your own package, and use this lib as dependency.

cztomsik avatar Apr 30 '23 07:04 cztomsik

JS tokenizer here https://github.com/josephrocca/rwkv-v4-web

BlinkDL avatar Apr 30 '23 19:04 BlinkDL

@BlinkDL thanks, but I couldn't get it working. But I did a quick and dirty impl here, it seems to work. 🤷 https://github.com/cztomsik/ggml-js/blob/main/lib/tokenizer.js#L63

cztomsik avatar May 02 '23 07:05 cztomsik

@cztomsik Try my unit tests: bottom of https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py

BlinkDL avatar May 02 '23 13:05 BlinkDL

It's broken, thanks 😆

cztomsik avatar May 03 '23 06:05 cztomsik

Tokenizer fixed, mmap/no_alloc fixed too (it can now load 1B raven model without having to copy the weights first)

image

The next one is sampling, stopping at end-token and async (I'm not really sure how that will map to ggml)

cztomsik avatar May 03 '23 22:05 cztomsik

RWKV example now works with f16 matrices. Run python rwkv-convert.py <model> --mtype f16 to generate smaller file.

For example, this is 3B Raven model.

https://user-images.githubusercontent.com/3526922/236181718-30adc57a-d571-4ae4-9ab8-12f73147af7c.mov

Q4 and Q8 should work too but it's not supported in the conversion script yet.

As you can see the bigger problem right now is the sampling.

cztomsik avatar May 04 '23 10:05 cztomsik