ggml-js RWKV-LM with ggml-js

Is it possible to use RWKV-LM https://github.com/BlinkDL/RWKV-LM with ggml-js and Bunjs https://bun.sh/ Is there any example available?

Apr 28 '23 04:04 ansarizafar

Not yet, the problem is that there are no tokenizers in JS ecosystem yet. There are node.js bindings for huggingface tokenizers, but was not able to install them. So I don't know.

I hope I won't need to re-implement another BPE tokenizer just to run the example, but I definitely want to run RWKV from node. And if it runs in node, it should also run in Bun, because it's just N-API wrapper + a little bit of plain JS (no build step, etc).

Apr 28 '23 07:04 cztomsik

BTW: the example is here https://github.com/cztomsik/ggml-js/blob/main/examples/rwkv.js I'm not sure if it actually works correctly but I will figure out eventually :)

Apr 28 '23 08:04 cztomsik

updated the example, now it can generate some text

https://user-images.githubusercontent.com/3526922/235147373-1d3b7205-8c4c-4654-940c-78a4baeb4fad.mov

Apr 28 '23 12:04 cztomsik

The example code is too difficult to understand for developers new to ML and LLM. Is it possible to created an abstraction/lib to make it easy to use?

Apr 29 '23 05:04 ansarizafar

Yeah, it's just PoC :) do you have any specific API in mind? What are you trying to do?

BTW: bun does not work currently - it seems that napi_set_instance_data() is not implemented/exported in bun currently. https://github.com/oven-sh/bun/issues/158#issuecomment-1528890831

Apr 29 '23 22:04 cztomsik

I want to use RWKV with langchainjs https://js.langchain.com/docs/ so an adapter for langchainjs would be great.

Apr 30 '23 05:04 ansarizafar

I see, but that's unlikely, at least not in the short term. I can move the RWKV model from examples to the main package, but there is still a lot of functionality missing, to be practically useful (top_k/top_p sampling, repetition-penalty, fixing the mmap vs. no_alloc, async, etc.), and I will definitely rather focus on those.

Sorry. But you should be able to easily create your own package, and use this lib as dependency.

Apr 30 '23 07:04 cztomsik

JS tokenizer here https://github.com/josephrocca/rwkv-v4-web

Apr 30 '23 19:04 BlinkDL

@BlinkDL thanks, but I couldn't get it working. But I did a quick and dirty impl here, it seems to work. 🤷 https://github.com/cztomsik/ggml-js/blob/main/lib/tokenizer.js#L63

May 02 '23 07:05 cztomsik

@cztomsik Try my unit tests: bottom of https://github.com/BlinkDL/ChatRWKV/blob/main/tokenizer/rwkv_tokenizer.py

May 02 '23 13:05 BlinkDL

It's broken, thanks 😆

May 03 '23 06:05 cztomsik

Tokenizer fixed, mmap/no_alloc fixed too (it can now load 1B raven model without having to copy the weights first)

The next one is sampling, stopping at end-token and async (I'm not really sure how that will map to ggml)

May 03 '23 22:05 cztomsik

RWKV example now works with f16 matrices. Run python rwkv-convert.py <model> --mtype f16 to generate smaller file.

For example, this is 3B Raven model.

https://user-images.githubusercontent.com/3526922/236181718-30adc57a-d571-4ae4-9ab8-12f73147af7c.mov

Q4 and Q8 should work too but it's not supported in the conversion script yet.

As you can see the bigger problem right now is the sampling.

May 04 '23 10:05 cztomsik

ggml-js ggml-js copied to clipboard

RWKV-LM with ggml-js

ggml-js
ggml-js copied to clipboard