litellm [Bug]: anthropic-beta header not being forwarded on litellm-proxy

What happened?

I got this error:

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/xingyaow/.cache/pypoetry/virtualenvs/openhands-ai-xjaqfiyC-py3.12/lib/python3.12/site-packages/litellm/main.py", line 1729, in completion
    raise e
  File "/home/xingyaow/.cache/pypoetry/virtualenvs/openhands-ai-xjaqfiyC-py3.12/lib/python3.12/site-packages/litellm/main.py", line 1702, in completion
    response = openai_chat_completions.completion(
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/xingyaow/.cache/pypoetry/virtualenvs/openhands-ai-xjaqfiyC-py3.12/lib/python3.12/site-packages/litellm/llms/openai/openai.py", line 736, in completion
    raise OpenAIError(
litellm.llms.openai.common_utils.OpenAIError: Error code: 400 - {'error': {'message': 'litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"max_tokens: 128000 > 64000, which is the maximum allowed number of output tokens for claude-3-7-sonnet-20250219"}}No fallback model group found for original model_group=anthropic/claude-3-7-sonnet-20250219. Fallbacks=[{\'claude-3-5-haiku-20241022\': [\'anthropic/claude-3-5-haiku-20241022\']}, {\'claude-3-5-sonnet-20241022\': [\'anthropic/claude-3-5-sonnet-20241022\']}, {\'claude-3-5-sonnet-20240620\': [\'anthropic/claude-3-5-sonnet-20240620\']}, {\'gpt-4o-2024-08-06\': [\'openai/gpt-4o-2024-08-06\']}, {\'gpt-4o-2024-05-13\': [\'openai/gpt-4o-2024-05-13\']}, {\'gpt-4o-mini-2024-07-18\': [\'openai/gpt-4o-mini-2024-07-18\']}]. Received Model Group=anthropic/claude-3-7-sonnet-20250219\nAvailable Model Group Fallbacks=None\nError doing the fallback: litellm.BadRequestError: AnthropicException - {"type":"error","error":{"type":"invalid_request_error","message":"max_tokens: 128000 > 64000, which is the maximum allowed number of output tokens for claude-3-7-sonnet-20250219"}}No fallback model group found for original model_group=anthropic/claude-3-7-sonnet-20250219. Fallbacks=[{\'claude-3-5-haiku-20241022\': [\'anthropic/claude-3-5-haiku-20241022\']}, {\'claude-3-5-sonnet-20241022\': [\'anthropic/claude-3-5-sonnet-20241022\']}, {\'claude-3-5-sonnet-20240620\': [\'anthropic/claude-3-5-sonnet-20240620\']}, {\'gpt-4o-2024-08-06\': [\'openai/gpt-4o-2024-08-06\']}, {\'gpt-4o-2024-05-13\': [\'openai/gpt-4o-2024-05-13\']}, {\'gpt-4o-mini-2024-07-18\': [\'openai/gpt-4o-mini-2024-07-18\']}] LiteLLM Retried: 2 times, LiteLLM Max Retries: 3', 'type': None, 'param': None, 'code': '400'}}

The same request work if I directly send the request to anthropic

Relevant log output

# Repro script

kwargs={'messages': [{'content': [{'type': 'text', 'text': 'You are OpenHands agent, a helpful AI assistant that can interact with a computer to solve tasks.\n<IMPORTANT>\n* If user provides a path, you should NOT assume it\'s relative to the current working directory. Instead, you should explore the file system to find the file before working on it.\n* When configuring git credentials, use "openhands" as the user.name and "[email protected]" as the user.email by default, unless explicitly instructed otherwise.\n* You MUST NOT include comments in the code unless they are necessary to describe non-obvious behavior.\n* If the user asks you to edit a file, you should edit the file directly, do NOT create a new file with the updated content unless the user explicitly instructs you to do so.\n* When you are doing global search-and-replace, consider using `sed` instead of running file editor multiple times.\n* Only use GITHUB_TOKEN and other credentials in ways that the user has asked for and would expect. Do NOT make potentially dangerous changes (e.g. pushing to main, deleting a repository) unless explicitly asked to do so.\n* Use APIs to work with GitHub or other platforms, unless the user asks otherwise or your task requires browsing.\n* If you\'ve made repeated attempts to solve a problem, but the tests won\'t pass or the user says it\'s still broken, reflect on 5-7 different possible sources of the problem. Assess the likelihood of these options, and proceed with fixing the most likely one.\n</IMPORTANT>', 'cache_control': {'type': 'ephemeral'}}], 'role': 'system'}, {'content': [{'type': 'text', 'text': "<uploaded_files>\n/workspace/scikit-learn__scikit-learn__0.21\n</uploaded_files>\nI've uploaded a python code repository in the directory scikit-learn__scikit-learn__0.21. Consider the following issue description:\n\n<issue_description>\nPipeline should implement __len__\n#### Description\r\n\r\nWith the new indexing support `pipe[:len(pipe)]` raises an error.\r\n\r\n#### Steps/Code to Reproduce\r\n\r\n\r\nfrom sklearn import svm\r\nfrom sklearn.datasets import samples_generator\r\nfrom sklearn.feature_selection import SelectKBest\r\nfrom sklearn.feature_selection import f_regression\r\nfrom sklearn.pipeline import Pipeline\r\n\r\n# generate some data to play with\r\nX, y = samples_generator.make_classification(\r\n    n_informative=5, n_redundant=0, random_state=42)\r\n\r\nanova_filter = SelectKBest(f_regression, k=5)\r\nclf = svm.SVC(kernel='linear')\r\npipe = Pipeline([('anova', anova_filter), ('svc', clf)])\r\n\r\nlen(pipe)\r\n\r\n\r\n#### Versions\r\n\r\n\r\nSystem:\r\n    python: 3.6.7 | packaged by conda-forge | (default, Feb 19 2019, 18:37:23)  [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]\r\nexecutable: /Users/krisz/.conda/envs/arrow36/bin/python\r\n   machine: Darwin-18.2.0-x86_64-i386-64bit\r\n\r\nBLAS:\r\n    macros: HAVE_CBLAS=None\r\n  lib_dirs: /Users/krisz/.conda/envs/arrow36/lib\r\ncblas_libs: openblas, openblas\r\n\r\nPython deps:\r\n       pip: 19.0.3\r\nsetuptools: 40.8.0\r\n   sklearn: 0.21.dev0\r\n     numpy: 1.16.2\r\n     scipy: 1.2.1\r\n    Cython: 0.29.6\r\n    pandas: 0.24.1\r\n\n\n</issue_description>\n\nCan you help me implement the necessary changes to the repository so that the requirements specified in the <issue_description> are met?\nI've already taken care of all changes to any of the test files described in the <issue_description>. This means you DON'T have to modify the testing logic or any of the tests in any way!\nAlso the development Python environment is already set up for you (i.e., all dependencies already installed), so you don't need to install other packages.\nYour task is to make the minimal changes to non-test files in the /workspace directory to ensure the <issue_description> is satisfied.\nFollow these steps to resolve the issue:\n1. As a first step, it might be a good idea to explore the repo to familiarize yourself with its structure.\n2. Create a script to reproduce the error and execute it with `python <filename.py>` using the BashTool, to confirm the error\n3. Edit the sourcecode of the repo to resolve the issue\n4. Rerun your reproduce script and confirm that the error is fixed!\n5. Think about edgecases, add comprehensive tests for them in your reproduce script, and run them to make sure your fix handles them as well\n6. Once you are done with the initial implementation, please carefully re-read the problem description and check the difference between the current code and the base commit a62775e99f2a5ea3d51db7160fad783f6cd8a4c5. Do you think that the issue has been completely and comprehensively solved? Write tests to check the correctness of the solution, specifically focusing on tests that may point out any remaining problems that are not yet solved. Run all of the tests in the repo and check if any of them fail, and if they do fix the code. Repeat this process of carefully reading the problem description and current implementation, testing, and fixing any problems until you are confident that the current implementation is correct. Find and run any tests in the repo that are related to:\n   - The issue you are fixing\n   - The files you modified\n   - The functions you changed\n   Make sure all these tests pass with your changes.\nYour thinking should be thorough and so it's fine if it's very long.\n", 'cache_control': {'type': 'ephemeral'}}], 'role': 'user'}], 'tools': [{'type': 'function', 'function': {'name': 'execute_bash', 'description': 'Execute a bash command in the terminal.\n* Long running commands: For commands that may run indefinitely, it should be run in the background and the output should be redirected to a file, e.g. command = `python3 app.py > server.log 2>&1 &`.\n* Interact with running process: If a bash command returns exit code `-1`, this means the process is not yet finished. By setting `is_input` to `true`, the assistant can interact with the running process and send empty `command` to retrieve any additional logs, or send additional text (set `command` to the text) to STDIN of the running process, or send command like `C-c` (Ctrl+C), `C-d` (Ctrl+D), `C-z` (Ctrl+Z) to interrupt the process.\n* One command at a time: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.\n', 'parameters': {'type': 'object', 'properties': {'command': {'type': 'string', 'description': 'The bash command to execute. Can be empty string to view additional logs when previous exit code is `-1`. Can be `C-c` (Ctrl+C) to interrupt the currently running process. Note: You can only execute one bash command at a time. If you need to run multiple commands sequentially, you can use `&&` or `;` to chain them together.'}, 'is_input': {'type': 'string', 'description': 'If True, the command is an input to the running process. If False, the command is a bash command to be executed in the terminal. Default is False.', 'enum': ['true', 'false']}}, 'required': ['command']}}}, {'type': 'function', 'function': {'name': 'think', 'description': 'Use the tool to think about something. It will not obtain new information or make any changes to the repository, but just log the thought. Use it when complex reasoning or brainstorming is needed.\n\nCommon use cases:\n1. When exploring a repository and discovering the source of a bug, call this tool to brainstorm several unique ways of fixing the bug, and assess which change(s) are likely to be simplest and most effective.\n2. After receiving test results, use this tool to brainstorm ways to fix failing tests.\n3. When planning a complex refactoring, use this tool to outline different approaches and their tradeoffs.\n4. When designing a new feature, use this tool to think through architecture decisions and implementation details.\n5. When debugging a complex issue, use this tool to organize your thoughts and hypotheses.\n\nThe tool simply logs your thought process for better transparency and does not execute any code or make changes.', 'parameters': {'type': 'object', 'properties': {'thought': {'type': 'string', 'description': 'The thought to log.'}}, 'required': ['thought']}}}, {'type': 'function', 'function': {'name': 'finish', 'description': "Signals the completion of the current task or conversation.\n\nUse this tool when:\n- You have successfully completed the user's requested task\n- You cannot proceed further due to technical limitations or missing information\n\nThe message should include:\n- A clear summary of actions taken and their results\n- Any next steps for the user\n- Explanation if you're unable to complete the task\n- Any follow-up questions if more information is needed\n\nThe task_completed field should be set to True if you believed you have completed the task, and False otherwise.\n", 'parameters': {'type': 'object', 'required': ['message', 'task_completed'], 'properties': {'message': {'type': 'string', 'description': 'Final message to send to the user'}, 'task_completed': {'type': 'string', 'enum': ['true', 'false', 'partial'], 'description': 'Whether you have completed the task.'}}}}}, {'type': 'function', 'function': {'name': 'str_replace_editor', 'description': 'Custom editing tool for viewing, creating and editing files in plain-text format\n* State is persistent across command calls and discussions with the user\n* If `path` is a file, `view` displays the result of applying `cat -n`. If `path` is a directory, `view` lists non-hidden files and directories up to 2 levels deep\n* The `create` command cannot be used if the specified `path` already exists as a file\n* If a `command` generates a long output, it will be truncated and marked with `<response clipped>`\n* The `undo_edit` command will revert the last edit made to the file at `path`\n\nNotes for using the `str_replace` command:\n* The `old_str` parameter should match EXACTLY one or more consecutive lines from the original file. Be mindful of whitespaces!\n* If the `old_str` parameter is not unique in the file, the replacement will not be performed. Make sure to include enough context in `old_str` to make it unique\n* The `new_str` parameter should contain the edited lines that should replace the `old_str`\n', 'parameters': {'type': 'object', 'properties': {'command': {'description': 'The commands to run. Allowed options are: `view`, `create`, `str_replace`, `insert`, `undo_edit`.', 'enum': ['view', 'create', 'str_replace', 'insert', 'undo_edit'], 'type': 'string'}, 'path': {'description': 'Absolute path to file or directory, e.g. `/workspace/file.py` or `/workspace`.', 'type': 'string'}, 'file_text': {'description': 'Required parameter of `create` command, with the content of the file to be created.', 'type': 'string'}, 'old_str': {'description': 'Required parameter of `str_replace` command containing the string in `path` to replace.', 'type': 'string'}, 'new_str': {'description': 'Optional parameter of `str_replace` command containing the new string (if not given, no string will be added). Required parameter of `insert` command containing the string to insert.', 'type': 'string'}, 'insert_line': {'description': 'Required parameter of `insert` command. The `new_str` will be inserted AFTER the line `insert_line` of `path`.', 'type': 'integer'}, 'view_range': {'description': 'Optional parameter of `view` command when `path` points to a file. If none is given, the full file is shown. If provided, the file will be shown in the indicated line number range, e.g. [11, 12] will show lines 11 and 12. Indexing at 1 to start. Setting `[start_line, -1]` shows all lines from `start_line` to the end of the file.', 'items': {'type': 'integer'}, 'type': 'array'}}, 'required': ['command', 'path']}}}]}

response = litellm.completion(
    model = 'litellm_proxy/claude-3-7-sonnet-20250219',
    base_url='https://llm-proxy.app.all-hands.dev',
    api_key=API_KEY,
    thinking={
        "type": "enabled",
        "budget_tokens": 120000,
    },
    extra_headers={'anthropic-beta': 'output-128k-2025-02-19'},
    max_completion_tokens=128000,
    drop_params=True,
    **kwargs
)
print(response)

Are you a ML Ops Team?

No

What LiteLLM version are you on ?

v1.61.19; 1.61.20 on the LLM Proxy

Twitter / LinkedIn details

No response

Mar 05 '25 23:03 xingyaoww

max_tokens: 128000 > 64000, which is the maximum allowed number of output tokens for claude-3-7-sonnet-20250219

this looks like it came from anthropic

Mar 06 '25 00:03 krrishdholakia

directly send the request to anthropic

does it work with anthropic/ on the sdk?

Mar 06 '25 00:03 krrishdholakia

Yes, it does work on the litellm SDK with anthropic/

Mar 06 '25 00:03 xingyaoww

extra_headers={'anthropic-beta': 'output-128k-2025-02-19'},

i assume this header isn't being forwarded.

Mar 06 '25 00:03 krrishdholakia

Hit the same issue, any temporary workaround?

Surprisingly, it worked well for good two hours and then it started throwing error, so very likely that something changed on anthropic's end.

Mar 06 '25 08:03 hardiksd

I believe they added some sort of validation or something, I am rolling back to claude-3-5 until they fix it. According to their bot this header is only necessary to 3-7. @hardiksd found any solution?

Mar 06 '25 10:03 regismesquita

Temporarily I rolled back to Sonnet 3.5. it is working as expected.

Mar 06 '25 10:03 regismesquita

No solution so far. I also rolled back to Sonnet 3.5 for now.

Mar 06 '25 10:03 hardiksd

One other workaround I found was to use openrouter to access claude 3.7 but I noticed that prompt caching was not working with it so the costs are much higher vs using anthropic API directly. Also, no rate limits on openrouter so it just runs non stops, that felt good.

Mar 07 '25 11:03 hardiksd

@xingyaoww is this bug making the context much smaller with Openhands using Claude 3.7? It seems like it is coding very well but forgetting what it's doing very quickly.

Mar 15 '25 09:03 amirshawn

any updates on this issue?

Apr 15 '25 15:04 gddezero

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.

Jul 15 '25 00:07 github-actions[bot]