Nemo Mistral error : there was an error with the local ollama instance
Nemo Mistral error : there was an error with the local ollama instance
(base) ferran@z590i:~$ flatpak run com.jeffser.Alpaca
çF: Not sharing "/usr/share" with sandbox: Path "/usr" is reserved by Flatpak
F: Not sharing "/usr/share/themes" with sandbox: Path "/usr" is reserved by Flatpak
INFO [main.py | main] Alpaca version: 1.1.1
INFO [local_instance.py | start] Starting Alpaca's Ollama instance...
2024/08/19 21:08:16 routes.go:1108: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ferran/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T21:08:16.982+02:00 level=INFO source=images.go:781 msg="total blobs: 44"
time=2024-08-19T21:08:16.982+02:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
time=2024-08-19T21:08:16.983+02:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11435 (version 0.3.3)"
time=2024-08-19T21:08:16.984+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/home/ferran/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama1103925684/runners
INFO [local_instance.py | start] Started Alpaca's Ollama instance
INFO [local_instance.py | start] Ollama version: 0.3.3
time=2024-08-19T21:08:21.898+02:00 level=INFO source=payload.go:44 msg="Dynamic LLM libraries [cpu_avx2 cuda_v11 rocm_v60102 cpu cpu_avx]"
time=2024-08-19T21:08:21.898+02:00 level=INFO source=gpu.go:205 msg="looking for compatible GPUs"
time=2024-08-19T21:08:22.134+02:00 level=INFO source=types.go:105 msg="inference compute" id=GPU-d8759212-99fb-5816-f4d7-aa3b8079b843 library=cuda compute=8.6 driver=0.0 name="" total="7.7 GiB" available="263.7 MiB"
[GIN] 2024/08/19 - 21:08:22 | 200 | 1.198305ms | 127.0.0.1 | GET "/api/tags"
[GIN] 2024/08/19 - 21:08:22 | 200 | 1.161983ms | 127.0.0.1 | GET "/api/tags"
time=2024-08-19T21:08:37.627+02:00 level=INFO source=memory.go:309 msg="offload to cuda" layers.requested=-1 layers.model=41 layers.offload=0 layers.split="" memory.available="[236.2 MiB]" memory.required.full="6.5 GiB" memory.required.partial="0 B" memory.required.kv="320.0 MiB" memory.required.allocations="[0 B]" memory.weights.total="6.0 GiB" memory.weights.repeating="5.5 GiB" memory.weights.nonrepeating="525.0 MiB" memory.graph.full="172.0 MiB" memory.graph.partial="801.0 MiB"
time=2024-08-19T21:08:37.628+02:00 level=INFO source=server.go:384 msg="starting llama server" cmd="/home/ferran/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama1103925684/runners/cpu_avx2/ollama_llama_server --model /home/ferran/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 --ctx-size 2048 --batch-size 512 --embedding --log-disable --no-mmap --parallel 1 --port 40157"
time=2024-08-19T21:08:37.628+02:00 level=INFO source=sched.go:445 msg="loaded runners" count=1
time=2024-08-19T21:08:37.628+02:00 level=INFO source=server.go:584 msg="waiting for llama runner to start responding"
time=2024-08-19T21:08:37.628+02:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server error"
INFO [main] build info | build=1 commit="6eeaeba" tid="140080432502656" timestamp=1724094517
INFO [main] system info | n_threads=8 n_threads_batch=-1 system_info="AVX = 1 | AVX_VNNI = 0 | AVX2 = 1 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | AVX512_BF16 = 0 | FMA = 1 | NEON = 0 | SVE = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | SSSE3 = 1 | VSX = 0 | MATMUL_INT8 = 0 | LLAMAFILE = 1 | " tid="140080432502656" timestamp=1724094517 total_threads=16
INFO [main] HTTP server listening | hostname="127.0.0.1" n_threads_http="15" port="40157" tid="140080432502656" timestamp=1724094517
llama_model_loader: loaded meta data with 35 key-value pairs and 363 tensors from /home/ferran/.var/app/com.jeffser.Alpaca/data/.ollama/models/blobs/sha256-b559938ab7a0392fc9ea9675b82280f2a15669ec3e0e0fc491c9cb0a7681cf94 (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = llama
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Mistral Nemo Instruct 2407
llama_model_loader: - kv 3: general.version str = 2407
llama_model_loader: - kv 4: general.finetune str = Instruct
llama_model_loader: - kv 5: general.basename str = Mistral-Nemo
llama_model_loader: - kv 6: general.size_label str = 12B
llama_model_loader: - kv 7: general.license str = apache-2.0
llama_model_loader: - kv 8: general.languages arr[str,9] = ["en", "fr", "de", "es", "it", "pt", ...
llama_model_loader: - kv 9: llama.block_count u32 = 40
llama_model_loader: - kv 10: llama.context_length u32 = 1024000
llama_model_loader: - kv 11: llama.embedding_length u32 = 5120
llama_model_loader: - kv 12: llama.feed_forward_length u32 = 14336
llama_model_loader: - kv 13: llama.attention.head_count u32 = 32
llama_model_loader: - kv 14: llama.attention.head_count_kv u32 = 8
llama_model_loader: - kv 15: llama.rope.freq_base f32 = 1000000.000000
llama_model_loader: - kv 16: llama.attention.layer_norm_rms_epsilon f32 = 0.000010
llama_model_loader: - kv 17: llama.attention.key_length u32 = 128
llama_model_loader: - kv 18: llama.attention.value_length u32 = 128
llama_model_loader: - kv 19: general.file_type u32 = 2
llama_model_loader: - kv 20: llama.vocab_size u32 = 131072
llama_model_loader: - kv 21: llama.rope.dimension_count u32 = 128
llama_model_loader: - kv 22: tokenizer.ggml.add_space_prefix bool = false
llama_model_loader: - kv 23: tokenizer.ggml.model str = gpt2
llama_model_loader: - kv 24: tokenizer.ggml.pre str = tekken
llama_model_loader: - kv 25: tokenizer.ggml.tokens arr[str,131072] = ["<unk>", "<s>", "</s>", "[INST]", "[...
llama_model_loader: - kv 26: tokenizer.ggml.token_type arr[i32,131072] = [3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, ...
time=2024-08-19T21:08:37.880+02:00 level=INFO source=server.go:618 msg="waiting for server to become available" status="llm server loading model"
Exception in thread Thread-2 (log_output):
Traceback (most recent call last):
File "/usr/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
self.run()
File "/usr/lib/python3.11/threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "/app/share/Alpaca/alpaca/local_instance.py", line 22, in log_output
for line in iter(pipe.readline, ''):
File "<frozen codecs>", line 322, in decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc4 in position 128: invalid continuation byte
ERROR [window.py | connection_error] Connection error
INFO [local_instance.py | reset] Resetting Alpaca's Ollama instance
INFO [local_instance.py | stop] Stopping Alpaca's Ollama instance
INFO [local_instance.py | stop] Stopped Alpaca's Ollama instance
INFO [local_instance.py | start] Starting Alpaca's Ollama instance...
2024/08/19 21:08:39 routes.go:1108: INFO server config env="map[CUDA_VISIBLE_DEVICES: GPU_DEVICE_ORDINAL: HIP_VISIBLE_DEVICES: HSA_OVERRIDE_GFX_VERSION: OLLAMA_DEBUG:false OLLAMA_FLASH_ATTENTION:false OLLAMA_HOST:http://127.0.0.1:11435 OLLAMA_INTEL_GPU:false OLLAMA_KEEP_ALIVE:5m0s OLLAMA_LLM_LIBRARY: OLLAMA_MAX_LOADED_MODELS:0 OLLAMA_MAX_QUEUE:512 OLLAMA_MODELS:/home/ferran/.var/app/com.jeffser.Alpaca/data/.ollama/models OLLAMA_NOHISTORY:false OLLAMA_NOPRUNE:false OLLAMA_NUM_PARALLEL:0 OLLAMA_ORIGINS:[http://localhost https://localhost http://localhost:* https://localhost:* http://127.0.0.1 https://127.0.0.1 http://127.0.0.1:* https://127.0.0.1:* http://0.0.0.0 https://0.0.0.0 http://0.0.0.0:* https://0.0.0.0:* app://* file://* tauri://*] OLLAMA_RUNNERS_DIR: OLLAMA_SCHED_SPREAD:false OLLAMA_TMPDIR: ROCR_VISIBLE_DEVICES:]"
time=2024-08-19T21:08:39.324+02:00 level=INFO source=images.go:781 msg="total blobs: 44"
time=2024-08-19T21:08:39.325+02:00 level=INFO source=images.go:788 msg="total unused blobs removed: 0"
time=2024-08-19T21:08:39.325+02:00 level=INFO source=routes.go:1155 msg="Listening on 127.0.0.1:11435 (version 0.3.3)"
time=2024-08-19T21:08:39.325+02:00 level=WARN source=assets.go:100 msg="unable to cleanup stale tmpdir" path=/home/ferran/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama1103925684 error="remove /home/ferran/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama1103925684: directory not empty"
time=2024-08-19T21:08:39.325+02:00 level=INFO source=payload.go:30 msg="extracting embedded files" dir=/home/ferran/.var/app/com.jeffser.Alpaca/cache/tmp/ollama/ollama4015998741/runners
INFO [local_instance.py | start] Started Alpaca's Ollama instance
INFO [local_instance.py | start] Ollama version: 0.3.3
INFO [window.py | show_toast] There was an error with the local Ollama instance, so it has been reset
INFO [main] model loaded | tid="140080432502656" timestamp=1724094520
Hi, thanks for the report, it seems like Ollama is trying to output a message that can't be decoded to utf-8, I'm going to make it so that it escapes those lines
https://github.com/Jeffser/Alpaca/commit/ed54b2846ae2bd35950aa7cff4840b53e41f97dd
This should be enough to stop the app from crashing, thanks for the report again
many thanks Jeffser, I will compile from sources
if you need help with the project I can help coding
Thanks but I suggest you don't contribute just yet, I'm planning on rewriting most of the app at the end of the month when I start my college break
downloaded sources and compiled from, still getting the error
if you have a todo list I can help you with it (in this near future you talk about) >> [email protected]
it only affects Mistral Nemo :: yet the error still there (even if compiling from git to get the new code you already commited)
it only affects Mistral Nemo :: yet the error still there (even if compiling from git to get the new code you already commited)
I can confirm this
btw :: using a local ollama instance with alpaca GUI there is no error, is the embeded ollama the one that gives error with Nemo Mistral
I can't be entirely sure it's the same bug, but I now specifically get "there was an error with the local ollama instance, so it has been reset" with Mistral Nemo under Alpaca 2.0.2, while other models appear to work.
As the error happens, also, Alpaca appears to start more ollama instances (built into its flatpak), and the started instances that error out aren't killed when Alpaca is closed, either (sometimes resulting in several instances running and going out of memory).
Hi, I had a go at debugging this and got somewhere, but no idea how to actually fix it :face_with_diagonal_mouth:
git bisect points to this was introduced in https://github.com/Jeffser/Alpaca/commit/11dd13b430652a3440f1fe6b5a6287fa2b4b3213 . It appears to be caused by this code -
https://github.com/Jeffser/Alpaca/blob/08c0074ae531be8036583b8ce7f58409fd6a062c/src/connection_handler.py#L104
Looks like it maybe the combination of subprocess.PIPE and text=True that results in the bug. Avoiding that combination and replacing the following function's content with pass, results in working as expected, but that's not a solution
https://github.com/Jeffser/Alpaca/blob/08c0074ae531be8036583b8ce7f58409fd6a062c/src/connection_handler.py#L14-L24
Hi, sorry for abandoning this thread, I got busy with other stuff.
The code you mentioned makes logging the Ollama instance possible, it really shouldn't fail because it has that try except thing. I'm still not sure how this error happens, it might have something to do with how Ollama logs stuff.
I updated the Ollama instance recently, it might have fixed it self, it might not. I'm not sure since Nemo Mistral worked for me before
It doesn't appear to have fixed itself for me, it still happens with the 2.8.0 flatpak, specifically with Mistral Nemo. However, there is no "cannot decode" error currently in my logs, so I'm not fully sure it's still the same issue...
Let me know if you'd like an updated log, I won't spam it without an obvious need.
Nemo used to work for me as well, but I'm seeing the same "there was an error with the local ollama instance, so it has been reset" error when trying to use it now with Alpaca 3.1.0. I will note that I believe it's a different version of nemo than when it was working previously.
This issue seems solved for me as of Alpaca 4.0.0. I suggest those experiencing this issue to upgrade and let us know...
Edit: sorry, I have to take it back, it was either a fluke or I got confused as to which model was running. I don't get an error message in Alpaca now, but the ollama process still dies, and in the console I get
[...]
Exception: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
I'm honestly really confused as to why Nemo doesn't work.
In a couple of weeks once I have some free time I will rewrite the instance manager, hopefully that should fix it
Here's the full error message I get in my terminal in case it's useful:
ERROR [window.py | run_message] ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
Exception in thread Thread-22 (run_message):
Traceback (most recent call last):
File "/app/lib/python3.12/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/connectionpool.py", line 537, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/connection.py", line 466, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/client.py", line 1428, in getresponse
response.begin()
File "/usr/lib/python3.12/http/client.py", line 331, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/client.py", line 300, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/lib/python3.12/site-packages/requests/adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/connectionpool.py", line 847, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/util/retry.py", line 470, in increment
raise reraise(type(error), error, _stacktrace)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/util/util.py", line 38, in reraise
raise value.with_traceback(tb)
File "/app/lib/python3.12/site-packages/urllib3/connectionpool.py", line 793, in urlopen
response = self._make_request(
^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/connectionpool.py", line 537, in _make_request
response = conn.getresponse()
^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/urllib3/connection.py", line 466, in getresponse
httplib_response = super().getresponse()
^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/client.py", line 1428, in getresponse
response.begin()
File "/usr/lib/python3.12/http/client.py", line 331, in begin
version, status, reason = self._read_status()
^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/http/client.py", line 300, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/app/share/Alpaca/alpaca/window.py", line 670, in run_message
response = self.ollama_instance.request("POST", "api/chat", json.dumps(data), lambda data, message_element=message_element: message_element.update_message(data))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/share/Alpaca/alpaca/connection_handler.py", line 82, in request
response = requests.post(connection_url, headers=self.get_headers(True), data=data, stream=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/requests/api.py", line 115, in post
return request("post", url, data=data, json=json, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/requests/api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/requests/sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/requests/sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/app/lib/python3.12/site-packages/requests/adapters.py", line 501, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.12/threading.py", line 1075, in _bootstrap_inner
self.run()
File "/usr/lib/python3.12/threading.py", line 1012, in run
self._target(*self._args, **self._kwargs)
File "/app/share/Alpaca/alpaca/window.py", line 675, in run_message
raise Exception(e)
Just a quick update: I tried again with v5.2.0 but Nemo still isn't happy. The terminal output gives less information than it did previously:
ERROR [instance_manager.py | generate_message] Connection error.
Mistral 7B is the other model I have installed and it works fine. Let me know if there's any information I can provide that would be useful in figuring out why Nemo doesn't work.
possibly the same thing happens with the new Magistral model? is ollama crashing completely?