Josh Leverette

Results 132 comments of Josh Leverette

I'm getting this as well. As someone who isn't hugely familiar with the node ecosystem, I'm not sure how concerned I should be about this.

https://github.com/ggerganov/llama.cpp/pull/6965 has been merged now. I'm unclear when things were fixed in ollama, but I just tested with 0.1.35, and I can't reproduce it anymore. Closing.

I’ve been encountering the issue with codegemma, for what it’s worth.

The 20B and 34B models are working great on the 0.1.39 pre-release version of ollama: https://ollama.com/library/granite-code The underlying llama.cpp library still does not support the smaller Granite models, from what...

The PR that adds support for small models was just merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/7481 Hopefully we can get those models in ollama soon.

@ericcurtin I’m not a maintainer here, but I’ll point out that [IBM Granite](https://huggingface.co/ibm-granite/granite-7b-base) is largely unrelated to the IBM Granite Code models, and I’ve had the opposite anecdotal experience.

@Saik0s I appreciate how much work you've put into this, and how it is open source. I know the cloud subscription would be a way to make money off of...

This issue was marked as stale, but shouldn’t supporting more efficient architectures be a priority?

Now Google has released a [9B version of RecurrentGemma](https://huggingface.co/google/recurrentgemma-9b-it) ([arxiv link](https://arxiv.org/pdf/2404.07839)), which seems to score similarly to Gemma-7b, while supposedly being _far_ more efficient: ![max_throughput](https://github.com/ggerganov/llama.cpp/assets/726063/c248e959-e654-42d1-8932-03e86b0212de) ([source](https://huggingface.co/google/recurrentgemma-9b-it#throughput)) Any chance llama.cpp can...

Stalebot is an annoying concept. People hate it when commenters are leaving low effort comments like “bump”, but then stalebot closes the issue if no one does. RecurrentGemma *still* isn’t...