llama2.rs nice work, some questions

nice work, some questions

Open lucasjinreal opened this issue 2 years ago • 3 comments

trafficstars

does it using any mat calculation accelerate framework from rust's lib? any plan to make it further, for instance, make it like ggml popular

Aug 01 '23 02:08 lucasjinreal

It's using Rayon for data parallel matrix vector mult, but no other libraries.

See the rust library Candle which has a full implementation with matrix mults.

Was thinking I would try implementing GGML-style quantization. Any other features you would want?

Aug 01 '23 12:08 srush

Yes, would like ask some more questions:

will it support arm? Further more, arm with fp16? If so, mac with M1 or M2 can runs very happy with that;
I think rust can get a very impressive ecology compare with c or c++ with it's extensively libs like termUI or tauri etc, can be useful to build top level chat UI without pain seemlessly integrate with infer core in rust. Will consider make it as some infer core so that people can build their own UIs based on this?

Aug 02 '23 02:08 lucasjinreal

Should work with fine with arm. But currently it is f32 only. (Note though this is CPU no gpu support) Have to think about how to add f16.
I'm pretty new to rust, but I think people will come up with lots of ways to use it!

Aug 02 '23 14:08 srush