langfuse icon indicating copy to clipboard operation
langfuse copied to clipboard

bug: Self hosted container crashes due to random CPU spikes

Open reza-mohideen opened this issue 1 year ago • 1 comments

Describe the bug

We are seeing huge CPU and load spikes, which causes the entire application to crash and the api to be unavailable. Screenshot 2024-09-06 at 4 06 27 PM Screenshot 2024-09-06 at 4 06 36 PM

Even with load distributed across 2 containers we are seeing the same spike: Screenshot 2024-09-06 at 6 41 18 PM

To reproduce

We do at least 1 request every 5-10 second to our langfuse server. We are running 1 container with 3.75 CPU and 15GB of Memory. We have a total trace count of: 774,917.

We use langchain to make our llm calls.

llm = AzureChatOpenAI(
        deployment_name="nocd-gpt4o",
        openai_api_version="2024-05-01-preview",
        openai_api_key=os.getenv("AZURE_APIM_OPENAI_GPT4O_KEY"),
        azure_endpoint=os.getenv("AZURE_APIM_OPENAI_GPT4O_HOST"),
        model="gpt-4o",
        cache=False
    )

trace = langfuse.trace(
  name=request.project_name,
  user_id=request.user,
  tags=tags,
  metadata=request.metadata if request.metadata else {},
  version=request.version if request.version else "1",
  session_id=session_id,
  input=formatted_prompt,
 )

langfuse_handler = trace.get_langchain_handler()

resp = llm.invoke(formatted_prompt, config={"callbacks": [langfuse_handler]})

trace.update(output=resp)

SDK and container versions

container version: 2.78.0 python sdk version: 2.47.0

Additional information

No response

Are you interested to contribute a fix for this bug?

Yes

reza-mohideen avatar Sep 06 '24 22:09 reza-mohideen

@reza-mohideen, thanks for opening the issue. This is very interesting. I have a few follow up questions:

  • Do you have a special usage pattern? How many traces do you ingest per minute?
  • Can you share a higher granularity of CPU? Id be interested if the CPU is that high all the time or during some CPU intensive operations
  • Do you have large Inputs / Outputs for traces and do you tokenise + calculate cost in langfuse? We use tiktoken for tokenization which is quite CPU heavy. (Docs)
  • Do you see by any chance high error rates on the APIs? Does the UI load for you or do you have high latencies there?
  • Could you share server logs with us where the server crashes / do you see any crash reason?

maxdeichmann avatar Sep 23 '24 14:09 maxdeichmann

@reza-mohideen any additional input here would be super helpful as we do not observe this issue in our own environments. would love to help resolve this or otherwise close the issue

marcklingen avatar Oct 28 '24 08:10 marcklingen

@reza-mohideen i would recommend upgrading to V3 (https://langfuse.com/self-hosting). This new major version contains many major performance improvements across the board.

maxdeichmann avatar Dec 11 '24 14:12 maxdeichmann