anthropic-sdk-python icon indicating copy to clipboard operation
anthropic-sdk-python copied to clipboard

API 400 error: `Could not process PDF`

Open gordonhart opened this issue 9 months ago • 4 comments

Occasionally, when processing PDFs provided as base64-encoded document blocks, I see the following error from the API:

anthropic.BadRequestError: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'Could not process PDF'}}

This happens intermittently for some PDFs. Often retries will work, and other times retries will be rejected with the same error message. Looking at a specific case that I've seen recently (can't share document unfortunately due to privacy concerns), I've verified that this isn't due to any of the following explicitly stated limitations:

  1. Token limit: the full message chain comes in at <120k tokens (well under the 200k limit)
  2. Page count: 74 PDF pages are provided (well under the 100 page limit)
  3. Request size: serialized message contents are 5.9MB (well under the 32MB limit)
  4. PDF is not encrypted or password-protected

API details:

  • Method: Anthropic.with_options(timeout=60, max_retries=0).messages.create(...)
  • Model: claude-3-5-sonnet-20241022 (also repros with claude-3-7-sonnet-20250219)
  • Tools: repros with or without tools and tool_choice blocks
  • Prompt caching: repros with or without prompt caching enabled (dict(cache_control=dict(type="ephemeral")))

Is this a known error from the API and is there any additional information that I can use to debug it from my end? PDF processing works great for >99% of PDFs, and only rarely do I run into this special bug. Thanks!

gordonhart avatar Mar 12 '25 19:03 gordonhart

👋 Apologies for the delayed response -- would you happen to have any request IDs you could share?

Typically, this error occurs if the PDF is malformed or has erroneous content

kevinc13 avatar Apr 29 '25 15:04 kevinc13

Hi @kevinc13. We cannot share the data directly due to privacy concerns but we have some batch API message IDs (of the form msgbatch_...) corresponding to some of these failures. Would those suffice?

dylangrandmont avatar Apr 29 '25 19:04 dylangrandmont

Yes those would be helpful!

kevinc13 avatar Apr 29 '25 20:04 kevinc13

Hi @kevinc13, here are a couple batch message ids that have recently failed with this error:

  • msgbatch_01NH21hQWuu8VnpEh1Z4PZhZ
  • msgbatch_01Wc7yQnoJgficrzXXYckHPF

nankolena avatar Apr 30 '25 20:04 nankolena