setzer22 comments

Results 208 comments of


                                            setzer22

Firefox (and Thunderbird) pop-up windows immediately disappear

@danShumway I'm sorry, I left exwm a while ago. It's an awesome piece of software, but I needed something more stable for my day-to-day work :sweat_smile:. I can tell, though,...

Rust compilation requires Godot editor restart

I am also seeing a very similar behavior on Linux. It makes sense because: - Each time the game runs, the DLL is read from disk, so if there are...

Store KV cache of computed prompts to disk to avoid re-compute in follow-up runs

@im-not-tom I got this working [on my project](https://github.com/setzer22/llama-rs/pull/14) and those are basically the steps I followed :+1: I have verified this works by saving memory, restoring on a different process...

30B model doesn't load

This could be a discrepancy in size due to integer promotion rules / a potential overflow, since the sizes for 30B are gonna be larger. More liberal use of usize...

30B model doesn't load

Pushed! Sorry about that :sweat_smile: It was just a typo on my end.

30B model doesn't load

I was just able to load 30B with the changes on main, but I'll wait for others to confirm before closing the issue.

30B model doesn't load

@RCasatta you mean the f16 version? Yes, I wasn't able to load that one on my machine (32GB). But I'm able to load the quantized one just fine. Anyway, closing...

Implementation of prompt caching

Oh, before I forget. I also tried using the [snap](https://docs.rs/snap/latest/snap/) crate for compression. Some quick results: - Compression ratios for the prompt I'm sharing above look good. A cached prompt...

Implementation of prompt caching

> Looks good to me. Would this also enable continuing an existing generation that ran out before its end-of-text? Not directly, but it would be a step in the right...

Implementation of prompt caching

Alright, this was some major cleanup session! :smile: As discussed, I've broken down the old `infer_with_prompt` function into two functions: `feed_prompt` and `infer_next_token`. This helps untangle the mess the original...