Kerfuffle
Kerfuffle
Oh, nice. I look forward to ripping off their ide... I mean collaborating in the spirit of open source. *** This is probably poor etiquette but there's no ability to...
@saharNooby > Looks like our work is closely related by extending ggml, but diverges at actual implementation of the model -- you do it in Rust, I do it in...
Interesting. Is there a reason to implement those elementwise operations all separately instead of adding a generic elementwise map operation? The matrix multiplications matter so much with this, it's crazy....
I can't really help you with the C++ part. Come over to the Rust side! In seriousness though, you may well end up doing less work overall if you take...
Might be getting annoying me writing so many comments here, but: I've been working on my Rust RWKV implementation and got 8bit quantization working. I also managed to split it...
@saharNooby Uhhh... I basically cargo culted it from the official version so I don't know that I can give you a good answer here. See: 1. https://github.com/BlinkDL/ChatRWKV/blob/0d0abf181356c6f27501274cad18bdf28c83a45b/rwkv_pip_package/src/rwkv/model.py#L237 2. https://github.com/BlinkDL/ChatRWKV/blob/0d0abf181356c6f27501274cad18bdf28c83a45b/rwkv_pip_package/src/rwkv/model.py#L335 The...
I've been messing around trying to allow GGML to map arbitrary operations: https://github.com/KerfuffleV2/llama-rs/blob/5fd882035e95501d4127e30c84a838afbffcc95e/ggml/src/lib.rs#L207 This what it looks like in use: https://github.com/KerfuffleV2/llama-rs/blob/5fd882035e95501d4127e30c84a838afbffcc95e/llama-rs/src/lib.rs#L1310 The first one is just replacing the `ggml_add` operation,...
I think this is a great a idea. Also, it's probably even more of a reason to decouple llama-rs from the GGML crates, and I would think what you're talking...
I found this crate which looks pretty interesting: https://crates.io/crates/dagga It's for scheduling directed acyclic graphs (like GGML's graph, and I assume other ML type graphs would be similar). You can...
Sort of related to speeding up loading, I've been messing around with rewriting it to use a `mmap`-based approach and nom. I don't know if it's really on the right...