CognitiveTech
CognitiveTech
https://github.com/ggerganov/llama.cpp/pull/4963 seems support is in llama.cpp main and server
according to latest release notes, (marking this commit https://github.com/ollama/ollama/commit/72b12c3be7f7d8b2e0d1fb703e6d6973caff6493) llama.cpp is bumped to [b1999](https://github.com/ggerganov/llama.cpp/releases/tag/b1999) which is from last week, where selfextend support was added 3 weeks ago. So it seems...
Furthermore it seems relatively trivial to add the required parameters, based on previous additions shown here: [#276 Configurable Rope Frequency Parameters](https://github.com/ollama/ollama/pull/276/files)
Ok, so I did a little more digging. For one thing, those files have moved now, to here: https://github.com/ollama/ollama/blob/main/api/types.go https://github.com/ollama/ollama/blob/main/llm/llama.go For another thing, there are two places where options are...
I got the error just trying to build the app.. I'm really in over my head here.. will just try self-extend where its easier to pass parameters
> @cognitivetech right, why building it though! this parameter seems to be well set in llama.cpp though it's not very functional at the moment until flash attention gets merged there....
unfortunate for me, this upgrade breaks the app, using pip 23.3 on Ubuntu22.04 ``` (textgen) jack@irobot:~/github/text-generation-webui$ python server.py Traceback (most recent call last): File "/home/jack/github/text-generation-webui/server.py", line 14, in import gradio...
https://huggingface.co/NousResearch/Hermes-2-Pro-Mistral-7B > Hermes 2 Pro is an upgraded, retrained version of Nous Hermes 2, consisting of an updated and cleaned version of the OpenHermes 2.5 Dataset, as well as a...
i downloaded the latest release and tried to run qwen on ubuntu 20.. still no luck, here. and requires restart ollama service... `Feb 09 14:15:26 scrap ollama[1726]: error loading model:...