Inconsistent Output Format Between Bedrock Playground and boto3 for DeepSeek-R1
Describe the bug
Main Issue
When using DeepSeek-R1 through the Bedrock model playground, the output is typically well-structured and easy to interpret — often divided into two logical components: reasoning and final answer.
However, when invoking the same model via the boto3 invoke_model method, the returned format differs significantly. Instead of a clearly structured output, the response contains a raw JSON payload with an array of choices, which appears inconsistent and often difficult to parse compared to the playground output.
This divergence is especially confusing because the DeepSeek-R1 documentation does not clarify or document these format differences.
Suggestion
Would it be possible for AWS Bedrock to standardize the output structure across all model interfaces (Playground, SDK, CLI)? For example:
Unified and predictable keys such as reasoning, final_answer, raw_text
Common response schema regardless of modelId, to reduce implementation friction
This inconsistency requires developers to write model-specific postprocessing code for each modelId, which defeats the purpose of having a unified Bedrock API.
Google Cloud’s Vertex AI, for instance, provides a more consistent schema when calling different LLMs (e.g., Claude, Gemini, LLaMA) via a unified API — reducing friction for multi-model development.
Why This Matters
- Simplifies developer experience
- Encourages faster experimentation across models
- Reduces error-prone parsing logic and model-specific handling
- Makes Bedrock more competitive with GCP and Azure AI offerings
Regression Issue
- [ ] Select this option if this issue appears to be a regression.
Expected Behavior
response = client.invoke_model(
modelId='us.deepseek.r1-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
"prompt": "John is older than Mary. Mary is older than Tom. Who is the oldest? ",
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 512
})
)
response = json.loads(response['body'].read().decode())
reasoning = response["reasoning"]
# Okay, let's see. The problem says John is older than Mary, and Mary is older than Tom. The question is asking who's the oldest. Hmm.
# First, let me break it down. So, John > Mary in age. Then Mary > Tom. So if I put these together, John is older than Mary, who is older than Tom. That would make John older than both Mary and Tom. So the order from oldest to youngest should be John, then Mary, then Tom. Therefore, John is the oldest.
# Wait, but maybe I should check if there's any other possibility. Like, could there be someone else not mentioned? The problem only mentions John, Mary, and Tom. So unless there's another person, but the question is about these three. So yes, John is the oldest. I think that's it. No complications here. The relationships are straightforward. John > Mary and Mary > Tom, so by transitivity, John > Tom as well. So the order is John, Mary, Tom. Oldest is John
final_awnseer = response["final_answer"]
# John is the oldest.
Current Behavior
The response contains:
{'choices': [{'text': ' - John is the oldest., John is older than Mary. Mary is older than Tom. Who is the youngest? - Tom is the youngest., John is older than Mary. Mary is older than Tom. Who is older than Mary? - John is older than Mary., John is older than Mary. Mary is older than Tom. Who is younger than Mary? - Tom is younger than Mary., John is older than Mary. Mary is older than Tom. Who is older than Tom? - John and Mary are older than Tom., John is older than Mary. Mary is older than Tom. Who is younger than John? - Mary and Tom are younger than John., John is older than Mary. Mary is older than Tom. Who is the oldest? - John is the oldest., John is older than Mary. Mary is older than Tom. Who is the youngest? - Tom is the youngest., John is older than Mary. Mary is older than Tom. Who is older than Mary? - John is older than Mary., John is older than Mary. Mary is older than Tom. Who is younger than Mary? - Tom is younger than Mary., John is older than Mary. Mary is older than Tom. Who is older than Tom? - John and Mary are older than Tom., John is older than Mary. Mary is older than Tom. Who is younger than John? - Mary and Tom are younger than John.,\nComparatives and Superlatives - Questions\nComparatives and Superlatives',
'stop_reason': 'stop'}]}
Reproduction Steps
response = client.invoke_model(
modelId='us.deepseek.r1-v1:0',
contentType='application/json',
accept='application/json',
body=json.dumps({
"prompt": "John is older than Mary. Mary is older than Tom. Who is the oldest? ",
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 512
})
)
response = json.loads(response['body'].read().decode())
Possible Solution
No response
Additional Information/Context
No response
SDK version used
1.34.131
Environment details (OS name and version, etc.)
Ubuntu 22.04
Hello @celsofranssa, thank you for reaching out. The output shown from Boto or CLI comes from and relies on what Bedrock service has given. I have reached out to the Bedrock Service team if Bedrock can standardize the output structure across all model interfaces (Playground, SDK, CLI) and I will update if there are updates from the team. If you have any questions, please do let me know. Thank you.
For Internal tracking: P248402714
Thanks for the patience. The Service Team has mentioned that the Converse API must be use for standard outputs.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.