tabby llama-cpp: fix tabby when using llama-cpp with versions b2320+ (introduction of abort callbacks)

llama-cpp: fix tabby when using llama-cpp with versions b2320+ (introduction of abort callbacks)

Open ghthor opened this issue 10 months ago • 1 comments

Please describe the feature you want

Enabled tabbyml to run with llama-cpp version b2320 and newer.

Additional context

I've run into an issue supporting tabbyml on NixOS. This is due to the fact that I ran into issues trying to get tabbyml to compile with the vendored version of llama-cpp. Because of this I opt'ed to utilize the existing version of llama-cpp in nixpkgs and dynamically link against that version instead of the static linking against the vendored version included in this repo. At the initial time I merged the support for tabby 0.8.3, the version of llama-cpp in nixpkgs was b2296 and everything was working.

Since that time the version of llama-cpp in nixpkgs has been upgraded multiple times. I recently tried to upgrade my system and my tabbyml became broken. I narrowed this down the the version of llama-cpp that it was being linked against. After bisecting the version I've narrowed it down to b2320 introducing the change that breaks tabbyml. b2319 works as expected there are no issues. Anything newer than b2319 and tabbyml starts up and finds the GPU but then experiences a segfault.

Apr 01 19:21:02 cryptnix systemd[1]: Started Self-hosted AI coding assistant using large language models.
Apr 01 19:21:03 cryptnix tabby[135490]: 2024-04-01T23:21:03.214601Z  INFO tabby::serve: crates/tabby/src/serve.rs:116: Starting server, this might take a few minutes...
Apr 01 19:21:03 cryptnix tabby[135490]: 2024-04-01T23:21:03.218108Z  INFO tabby::services::code: crates/tabby/src/services/code.rs:53: Index is ready, enabling server...
Apr 01 19:21:03 cryptnix tabby[135490]: ggml_init_cublas: GGML_CUDA_FORCE_MMQ:   no
Apr 01 19:21:03 cryptnix tabby[135490]: ggml_init_cublas: CUDA_USE_TENSOR_CORES: yes
Apr 01 19:21:03 cryptnix tabby[135490]: ggml_init_cublas: found 1 CUDA devices:
Apr 01 19:21:03 cryptnix tabby[135490]:   Device 0: NVIDIA GeForce RTX 2080 SUPER, compute capability 7.5, VMM: yes
Apr 01 19:21:04 cryptnix tabby[135490]: *** stack smashing detected ***: terminated
Apr 01 19:21:07 cryptnix systemd[1]: tabby.service: Main process exited, code=dumped, status=6/ABRT
Apr 01 19:21:07 cryptnix systemd[1]: tabby.service: Failed with result 'core-dump'.

I'm not sure exactly how this change introduces the segfault, https://github.com/ggerganov/llama.cpp/commit/4a6e2d6142ab815c964924896891e9ab3e050632; but the change does look like it could introduce a segfault if tabbyml isn't providing some type of abort callback to llama-cpp. IE I imagine with further inspection we'd find that llama-cpp is expecting that callback to not be NULL and that's why it is segfault'ing.

This isn't as much a feature request as a warning about a potential issue we may have when updating the vendored version of llama-cpp in tabbyml. Though I looking to implement a fix into tabbyml for the this issue since it has caused the nixpkgs version of tabby to become broken without a clear path for users fix the breakage.

Please reply with a 👍 if you want this feature.

Apr 01 '24 23:04 ghthor

Seems the issue is gone when upgrading to b2715 in #1926

Apr 22 '24 22:04 wsxiaoys

tabby tabby copied to clipboard

llama-cpp: fix tabby when using llama-cpp with versions b2320+ (introduction of abort callbacks)

tabby
tabby copied to clipboard