litellm icon indicating copy to clipboard operation
litellm copied to clipboard

[Bug]: groq models do not support streaming when in JSON mode

Open ericmjl opened this issue 1 year ago • 2 comments

What happened?

It appears that with LiteLLM version 1.35.38 (I have not upgraded to the latest b/c of other issues with Ollama JSON mode), I am unable to use groq models with JSON mode with streaming. I have a minimal notebook that reproduces this issue on GitHub gist: https://gist.github.com/ericmjl/6f3e2cbbfcf26a8f3334a58af6a76f63

Relevant log output

You can find the notebook here: https://gist.github.com/ericmjl/6f3e2cbbfcf26a8f3334a58af6a76f63

Twitter / LinkedIn details

@ericmjl

ericmjl avatar Jul 20 '24 04:07 ericmjl

On the latest version I get this error @ericmjl - would you expect litellm to fake the streaming response ?

 GroqException - Error code: 400 - {'error': {'message': 'response_format` does not support streaming', 'type': 'invalid_request_error'}}

ishaan-jaff avatar Jul 20 '24 20:07 ishaan-jaff

@ishaan-jaff thinking about the problem from your perspective as a library maintainer, faking the streaming response might be good for the LiteLLM user experience but it'd also be adding a special case for you all to handle. I would love to see the streaming response faked (Groq is fast enough that for all practical purposes, just waiting for groq to return the full text is almost as good as seeing the streaming response), though I am cognizant of the extra burden it might put on you guys.

ericmjl avatar Jul 24 '24 01:07 ericmjl

I am not able to get litellm to send groq response_format at all. Have you run into that issue as well (streaming aside) @ericmjl ?

p-c-mo avatar Jan 04 '25 23:01 p-c-mo

what error do you see when sending response_format @misterfancysocks ?

are you on the latest litellm version ?

ishaan-jaff avatar Jan 05 '25 00:01 ishaan-jaff

Hey @ishaan-jaff

I'm not sure where to figure out which version of litellm (docker) i'm using, but here is the info for the image:

ghcr.io/berriai/litellm        main-stable       96ca897120c4   3 weeks ago     1.37GB

Here is my code:

import warnings
warnings.filterwarnings("ignore", category=UserWarning, module="pydantic")

import litellm
from litellm import completion
from dotenv import load_dotenv
import os

load_dotenv(os.path.expanduser('~/code/consumio/consumioish/.env'))

# os.environ['LITELLM_LOG'] = 'DEBUG'
litellm.set_verbose = True
litellm.api_base = "http://localhost:4000"
litellm.api_key = os.getenv("LITELLM_API_KEY")
litellm.success_callback = ["langfuse"] 

## set ENV variables

response = completion(
  model="groq/llama3-8b-8192",
  messages=[{"role": "user", "content": "hows it going? "}],
  response_format={"type": "json_object"},
  # stream=False
)

print(response.choices[0].message.content)

What I'm seeing is that when I submit a request with 'response_format' it will acknowledge in the logs that i've requested it but the actual curl request will not include this.

Request to litellm:
litellm.completion(model='groq/llama3-8b-8192', messages=[{'role': 'user', 'content': 'hows it going? '}], response_format={'type': 'json_object'})


18:49:27 - LiteLLM:WARNING: utils.py:316 - `litellm.set_verbose` is deprecated. Please set `os.environ['LITELLM_LOG'] = 'DEBUG'` for debug logs.
SYNC kwargs[caching]: False; litellm.cache: None; kwargs.get('cache')['no-cache']: False
Final returned optional params: {'extra_body': {}}


POST Request Sent from LiteLLM:
curl -X POST \
https://api.groq.com/openai/v1/chat/completions \
-H 'Content-Type: *****' -H 'Authorization: Bearer gsk_VAWbOVuF********************************************' \
-d '{'model': 'llama3-8b-8192', 'messages': [{'role': 'user', 'content': 'hows it going? '}], 'stream': False}'


RAW RESPONSE:
{"id": "chatcmpl-7e51986f-c022-4eaa-9f8b-42f7afbdc6fb", "object": "chat.completion", "created": 1736038167, "model": "llama3-8b-8192", "choices": [{"index": 0, "message": {"role": "assistant", "content": "I'm just an AI, I don't have feelings or emotions like humans do, but I'm functioning properly and ready to assist you with any questions or tasks you may have! How can I help you today?"}, "logprobs": null, "finish_reason": "stop"}], "usage": {"queue_time": 0.018435816, "prompt_tokens": 16, "prompt_time": 0.002348172, "completion_tokens": 44, "completion_time": 0.036666667, "total_tokens": 60, "total_time": 0.039014839}, "system_fingerprint": "fp_a97cfe35ae", "x_groq": {"id": "req_01jg******************"}}


Returned custom cost for model=groq/llama3-8b-8192 - prompt_tokens_cost_usd_dollar: 8e-07, completion_tokens_cost_usd_dollar: 3.52e-06
reaches langfuse for success logging!
Returned custom cost for model=groq/llama3-8b-8192 - prompt_tokens_cost_usd_dollar: 8e-07, completion_tokens_cost_usd_dollar: 3.52e-06
I'm just an AI, I don't have feelings or emotions like humans do, but I'm functioning properly and ready to assist you with any questions or tasks you may have! How can I help you today?

p-c-mo avatar Jan 05 '25 01:01 p-c-mo

I see the issue, we handle structured output for groq by leveraging their tool calling. our test missed the json mode scenario

here's the issue - https://github.com/BerriAI/litellm/blob/d74fa394543df9b38eec7ee9b0b6e440e3f2db07/litellm/llms/groq/chat/transformation.py#L153

will push a fix asap

krrishdholakia avatar Jan 05 '25 02:01 krrishdholakia

I see the issue, we handle structured output for groq by leveraging their tool calling. our test missed the json mode scenario

here's the issue -

https://github.com/BerriAI/litellm/blob/d74fa394543df9b38eec7ee9b0b6e440e3f2db07/litellm/llms/groq/chat/transformation.py#L153

will push a fix asap

You are the best, thank you!

p-c-mo avatar Jan 05 '25 03:01 p-c-mo

@krrishdholakia I just updated to 1.56.10 and it didn't work for me. Looking at the diff, it looks like there was just a test that was added.

p-c-mo avatar Jan 05 '25 04:01 p-c-mo

It's not on v1.56.10. The fix is on main. Will be on v1.57.0

krrishdholakia avatar Jan 05 '25 06:01 krrishdholakia

It's not on v1.56.10. The fix is on main. Will be on v1.57.0

Got it. Do you have a rough ETA?

p-c-mo avatar Jan 05 '25 15:01 p-c-mo

Should be out today hopefully. I believe we were just seeing some vertex rate limit errors causing the test to fail

krrishdholakia avatar Jan 05 '25 15:01 krrishdholakia

hey @krrishdholakia did this ever get deployed to the stable or latest images?

p-c-mo avatar Feb 16 '25 04:02 p-c-mo

@p-c-mo Yes this looks like it was merged in a while ago. Are you still seeing this issue?

krrishdholakia avatar Feb 16 '25 05:02 krrishdholakia

@krrishdholakia I feel dumb asking this but I am running this via docker-compose and cant figure out how to see the request headers that the litellm proxy is sending.

p-c-mo avatar Feb 17 '25 02:02 p-c-mo

following up @krrishdholakia

p-c-mo avatar Feb 19 '25 02:02 p-c-mo