Describe the bug

Main Issue

When using DeepSeek-R1 through the Bedrock model playground, the output is typically well-structured and easy to interpret — often divided into two logical components: reasoning and final answer.

However, when invoking the same model via the boto3 invoke_model method, the returned format differs significantly. Instead of a clearly structured output, the response contains a raw JSON payload with an array of choices, which appears inconsistent and often difficult to parse compared to the playground output.

This divergence is especially confusing because the DeepSeek-R1 documentation does not clarify or document these format differences.

Suggestion

Would it be possible for AWS Bedrock to standardize the output structure across all model interfaces (Playground, SDK, CLI)? For example:

Unified and predictable keys such as reasoning, final_answer, raw_text

Common response schema regardless of modelId, to reduce implementation friction

This inconsistency requires developers to write model-specific postprocessing code for each modelId, which defeats the purpose of having a unified Bedrock API.

Google Cloud’s Vertex AI, for instance, provides a more consistent schema when calling different LLMs (e.g., Claude, Gemini, LLaMA) via a unified API — reducing friction for multi-model development.

Why This Matters

Simplifies developer experience
Encourages faster experimentation across models
Reduces error-prone parsing logic and model-specific handling
Makes Bedrock more competitive with GCP and Azure AI offerings

Regression Issue

[ ] Select this option if this issue appears to be a regression.

Expected Behavior

response = client.invoke_model(
    modelId='us.deepseek.r1-v1:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps({
        "prompt": "John is older than Mary. Mary is older than Tom. Who is the oldest? ",
        "temperature": 0.7,
        "top_p": 0.9,
        "max_tokens": 512
    })
)

response = json.loads(response['body'].read().decode())

reasoning = response["reasoning"]
# Okay, let's see. The problem says John is older than Mary, and Mary is older than Tom. The question is asking who's the oldest. Hmm.
# First, let me break it down. So, John > Mary in age. Then Mary > Tom. So if I put these together, John is older than Mary, who is older than Tom. That would make John older than both Mary and Tom. So the order from oldest to youngest should be John, then Mary, then Tom. Therefore, John is the oldest.
# Wait, but maybe I should check if there's any other possibility. Like, could there be someone else not mentioned? The problem only mentions John, Mary, and Tom. So unless there's another person, but the question is about these three. So yes, John is the oldest. I think that's it. No complications here. The relationships are straightforward. John > Mary and Mary > Tom, so by transitivity, John > Tom as well. So the order is John, Mary, Tom. Oldest is John

final_awnseer = response["final_answer"]
# John is the oldest.

Current Behavior

The response contains:

{'choices': [{'text': ' - John is the oldest., John is older than Mary. Mary is older than Tom. Who is the youngest? - Tom is the youngest., John is older than Mary. Mary is older than Tom. Who is older than Mary? - John is older than Mary., John is older than Mary. Mary is older than Tom. Who is younger than Mary? - Tom is younger than Mary., John is older than Mary. Mary is older than Tom. Who is older than Tom? - John and Mary are older than Tom., John is older than Mary. Mary is older than Tom. Who is younger than John? - Mary and Tom are younger than John., John is older than Mary. Mary is older than Tom. Who is the oldest? - John is the oldest., John is older than Mary. Mary is older than Tom. Who is the youngest? - Tom is the youngest., John is older than Mary. Mary is older than Tom. Who is older than Mary? - John is older than Mary., John is older than Mary. Mary is older than Tom. Who is younger than Mary? - Tom is younger than Mary., John is older than Mary. Mary is older than Tom. Who is older than Tom? - John and Mary are older than Tom., John is older than Mary. Mary is older than Tom. Who is younger than John? - Mary and Tom are younger than John.,\nComparatives and Superlatives - Questions\nComparatives and Superlatives',
   'stop_reason': 'stop'}]}

Reproduction Steps

response = client.invoke_model(
    modelId='us.deepseek.r1-v1:0',
    contentType='application/json',
    accept='application/json',
    body=json.dumps({
        "prompt": "John is older than Mary. Mary is older than Tom. Who is the oldest? ",
        "temperature": 0.7,
        "top_p": 0.9,
        "max_tokens": 512
    })
)

response = json.loads(response['body'].read().decode())

Possible Solution

No response

Additional Information/Context

No response

SDK version used

1.34.131

Environment details (OS name and version, etc.)

Ubuntu 22.04

May 18 '25 16:05 celsofranssa

Hello @celsofranssa, thank you for reaching out. The output shown from Boto or CLI comes from and relies on what Bedrock service has given. I have reached out to the Bedrock Service team if Bedrock can standardize the output structure across all model interfaces (Playground, SDK, CLI) and I will update if there are updates from the team. If you have any questions, please do let me know. Thank you.

For Internal tracking: P248402714

Jun 05 '25 18:06 adev-code

Thanks for the patience. The Service Team has mentioned that the Converse API must be use for standard outputs.

Sep 16 '25 21:09 adev-code

This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one.

Sep 16 '25 21:09 github-actions[bot]

Inconsistent Output Format Between Bedrock Playground and boto3 for DeepSeek-R1

Describe the bug

Main Issue

Suggestion

Why This Matters

Regression Issue

Expected Behavior

Current Behavior

Reproduction Steps

Possible Solution

Additional Information/Context

SDK version used

Environment details (OS name and version, etc.)