LocalAI WaitGroup is reused before previous Wait has returned

LocalAI version:

Latest: 2d64d8b444fb4fe6aa646dc938329ad4e3b4789d

Environment, CPU architecture, OS, and Version:

MacBook M1 Max, Sonoma 14.1.1

Quantized GGUFv3 model.

Describe the bug

Doing multiple (for me around 4-5) calls to the chat/completions API eventually returns a error 500. This is with a system message and a user message. System message is around 250 tokens and user message is ~ 10 and "max_tokens" set to 1.

I've tried reproducing it with only a small message, which I couldn't.

Binary was build with just make build - No GPU offloading.

This worked as of a couple of days ago, but I'm unable to pinpoint the exact commit that introduced this.

To Reproduce

Compile with make build

Run LocalAI with ./local-ai --models-path=./models/ --debug=true

Run several consecutive calls (I had to run between 2 - 10 consecutive calls) - This payload worked for me:

curl http://localhost:8080/v1/chat/completions -H "Content-Type: application/json" -d '{
     "model": "llama-2-7b-chat",
     "messages": [{"role": "system", "content": "7cyFQatpU35c5s5ddY0cTdK0AHF5YjM5UDhPAH8FqCYAKQJaBsquqpnMpVDujThVzk11Zj7tCdjQtTBfuDvGJ1WDjzWJ2khgZfThaIQnkMIaWOtgrNaZDVX9TWjlaANZntxN6Q4IsCFit8DOJk5v1zZAMnx98irkp61yrhfWqhYhc8ERaG4hoIUJrXkxUhFaCWMg2zx8yKNy6K0NQDqf6qK3ZBKIkKUI1san3vPsnHg3kQz1h39JcQIahOzVUEbm95lDOaWUpgnMxlKwbtiDfwQkxUwq45lj7A467WtdMyLxs5pwmZ8bF8dInLuLNrY1psGEPouDWxJstisNCjaqLv4wVOO753Pn62AHlsFWzQyTB7QC5GUiOxmjcg6WOmfmnv7fJLNxfH6gqYbeleUxJUPJFdeyD9H7IEDYfspypZFpOiSJs9js8kqjSR3p2yqRdYsxPsOCp3HkVp13grY2vKfmc1OHiC6bll5lhYd5tw5ul7HoXh6wF2oHoZ0aPogNGEfOrLqAQ41k6pxoFkQOHhUNHaoWSbJgXUB7U32doQsFrB3M4oYpa3neIwbauuwOuM5Mkc3t3FWCe0D9xAfld9EI67PbNrnTaqrVKHXWhtKEUSfGWCM5OFGWwvWmhpSkeMDq3hpbjef2S3WIDanqA9ek0uoKAnNQslvqlrKOYyKT5sbR8SUiem1pFIaiQJqH92FZhA1O68gxre4CaUSMpVCyXOVZ4efLQsV1nAnwTWCfKzKTK5Y9yzoAeWdHLIhYCpHWcv3sMwMiWlQvo2HlKzbYExcmsAGykcU2wbV4cOFF4IWR4oKdQntpUafYQtKjLKhH4doOxMPKnvmzYGnt7TOLkxYoH3QaKhUhDtXoZZYOvJFLlUopvojKnrQH3ECgVNbKqG2pSwBNZWy7tXKiDeYWTlWBuHrR0VjdmISkRFXWbgJqsXuLyAWdeEYb556rvjSIifACaODRM3jVGQObvjkERQ9WhvQo1x94xhRuVmYp3X9OYfIlDynyXMDILfpw33bAc8zjrfyxQSNmwNrDACgnoUhfWtswaV7uA357XwP3Lkjqf4yTrWrJUdAV4MwNTj16hrZUZmeFsQuZhBGJ7BY6dek8AnD8I6RzJAeF1GqOv2ZVz"},{"role": "user", "content": "Question: Is this a test?"}],
     "temperature": 0.1, "max_tokens": 1
   }'

Expected behavior

A non-500 error.

Logs

6:15PM DBG Loading model llama from llama-2-7b-chat
6:15PM DBG Model already loaded in memory: llama-2-7b-chat
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr panic: sync: WaitGroup is reused before previous Wait has returned
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr goroutine 46 [running]:
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr sync.(*WaitGroup).Wait(0x1400028e510)
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr 	/opt/homebrew/Cellar/go/1.21.4/libexec/src/sync/waitgroup.go:118 +0xac
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr google.golang.org/grpc.(*Server).serveStreams(0x1400024c1e0, {0x100d61bd8?, 0x1400022a9c0})
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr 	/Users/mrh/go/pkg/mod/google.golang.org/[email protected]/server.go:999 +0x1a8
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr google.golang.org/grpc.(*Server).handleRawConn.func1()
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr 	/Users/mrh/go/pkg/mod/google.golang.org/[email protected]/server.go:920 +0x44
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr created by google.golang.org/grpc.(*Server).handleRawConn in goroutine 43
6:15PM DBG GRPC(llama-2-7b-chat-*********:51645): stderr 	/Users/mrh/go/pkg/mod/google.golang.org/[email protected]/server.go:919 +0x178

log.txt log2.txt

Additional context

name: llama-2-7b-chat
gpu_layers: 1
f16: true
parameters:
  top_k: 80
  top_p: 0.7
  model: llama-2-7b-chat
  temperature: 0.3
context_size: 4096
threads: 8
backend: llama
#system_prompt: |
#  You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe.  Your answers should not include any harmful, unethical, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
#  If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
template:
  chat: llama-2-chat
  chat_message: llama-2-chat-message
  #completion:
  #edit:
  #functions:

cat llama-2-chat-message.tmpl
{{if eq .RoleName "assistant"}}{{.Content}}{{else}}
[INST]
{{if .SystemPrompt}}{{.SystemPrompt}}{{else if eq .RoleName "system"}}<<SYS>>{{.Content}}<</SYS>>

{{else if .Content}}{{.Content}}{{end}}
[/INST]
{{end}}%

cat llama-2-chat.tmpl
<s>[INST]<<SYS>>
{{.SystemPrompt}}
<</SYS>>

[/INST]
{{ .Input }}

Nov 30 '23 17:11 mrh-chain

is that a gguf model? which backend is being used? the logs are not enough here to debug properly. Can you give the full snippet?

Nov 30 '23 17:11 mudler

Yeah, I know the details were crap. I managed to reproduce it with data I can actually share. One of them has the error that I reported here, and the other one contains a new panic.

I'll update the original request with the reproduction steps and the other info you requested.

log.txt log2.txt

Dec 01 '23 00:12 mrh-chain

I've updated the request with all the information - Do let me know if other info is required.

Thank you for creating this awesome project :D!

Dec 01 '23 00:12 mrh-chain

@mrh-chain looks like in your models yaml file you are missing a model? Did you rename the file? if so can you add the .gguf back to the file and add that to your yaml file? (Dont forget to restart localai after changing a models yaml file)

Dec 01 '23 00:12 lunamidori5

@mrh-chain looks like in your models yaml file you are missing a model? Did you rename the file? if so can you add the .gguf back to the file and add that to your yaml file? (Dont forget to restart localai after changing a models yaml file)

Hey @lunamidori5 - the file is called llama-2-7b-chat (without an extension) on the local filesystem and inference works most of the time, I'm pretty sure the model naming isn't the culprit.

I tried changing the configured model name to add back the .gguf extension and then also renamed the file on the filesystem to contain the extension.

The results are the same unfortunately :(

Dec 01 '23 11:12 mrh-chain

:warning::warning::warning::warning::warning:

Hi! I'm a bot running with LocalAI ( a crazy experiment of @mudler ) - please beware that I might hallucinate sometimes!

but.... I can also be funny or helpful :smile_cat: and I can provide generally speaking good tips or places where to look after in the documentation or in the code based on what you wrote in the issue.

Don't engage in conversation with me, I don't support (yet) replying!

:warning::warning::warning::warning::warning:

ERROR: The prompt size exceeds the context window size and cannot be processed.

Sources:

https://github.com/go-skynet/LocalAI/tree/master/.github/ISSUE_TEMPLATE/bug_report.md
https://localai.io/basics/news/index.html
https://localai.io/faq/index.html

Dec 02 '23 13:12 localai-bot

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

Nov 06 '25 02:11 github-actions[bot]

This issue was closed because it has been stalled for 5 days with no activity.

Nov 11 '25 02:11 github-actions[bot]

LocalAI LocalAI copied to clipboard

WaitGroup is reused before previous Wait has returned

:warning::warning::warning::warning::warning:

:warning::warning::warning::warning::warning:

LocalAI
LocalAI copied to clipboard