claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[BUG] API Timeout Issues (#1513) persist in 1.0.9

Open thomasj02 opened this issue 5 months ago • 39 comments

I can't re-open https://github.com/anthropics/claude-code/issues/1513 so I'm creating this new bug report. #1513 was for Claude never retrying in 1.0.8. Now in 1.0.9 it retries a couple of times and then gives up. For example:

  ⎿  API Error (Request timed out.) · Retrying in 1 seconds… (attempt 1/10)
  ⎿  API Error (Request timed out.) · Retrying in 1 seconds… (attempt 2/10)
  ⎿  API Error: Request timed out.

>

See additional comments at the end of #1513 for other people experiencing the same problem

Tagging @bcherny and @ant-kurt who investigated #1513

thomasj02 avatar Jun 03 '25 16:06 thomasj02

@thomasj02 sorry to hear you're still running into retry issues.

Were you able to try setting some of the debug flags ANTHROPIC_LOG=debug DEBUG=1 or --debug --verbose, to see if there's anything strange like a x-should-retry header being set to false?

ant-kurt avatar Jun 03 '25 17:06 ant-kurt

Even with --debug --verbose no additional info is printed.

> Let's make it static
  ⎿  API Error (Request timed out.) · Retrying in 1 seconds… (attempt 1/10)
  ⎿  API Error: Request timed out.

╭─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ >                                                                                                                                                                                               │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
  ? for shortcuts                                                                                                                                                                               ◯
                                                                                                                                                                                       Debug mode
                                                                                                                                                                                    104656 tokens
                                                                                                                                                             Context left until auto-compact: 37%

thomasj02 avatar Jun 03 '25 20:06 thomasj02

Using the env vars instead of the command line flags added a ton of information and I was able to catch it here:

[log_ef83ec, request-id: "req_011CPmw17sJZnh25C4hXjXfu"] post https://api.anthropic.com/v1/messages?beta=true succeeded with status 200 in 9016ms
[log_ef83ec] response start {
  url: 'https://api.anthropic.com/v1/messages?beta=true',
  status: 200,
  headers: {
    'anthropic-organization-id': '<redacted>',
    'anthropic-ratelimit-unified-fallback-percentage': '0.5',
    'anthropic-ratelimit-unified-reset': '1748984400',
    'anthropic-ratelimit-unified-status': 'allowed',
    'cf-cache-status': 'DYNAMIC',
    'cf-ray': '94a1fe60689be824-ORD',
    connection: 'keep-alive',
    'content-encoding': 'gzip',
    'content-type': 'application/json',
    date: 'Tue, 03 Jun 2025 20:38:25 GMT',
    'request-id': 'req_011CPmw17sJZnh25C4hXjXfu',
    server: 'cloudflare',
    'strict-transport-security': 'max-age=31536000; includeSubDomains; preload',
    'transfer-encoding': 'chunked',
    via: '1.1 google',
    'x-robots-tag': 'none'
  },
  durationMs: 9016
}
[log_ef83ec] response parsed {
  url: 'https://api.anthropic.com/v1/messages?beta=true',
  status: 200,
  body: {
    id: 'msg_015U8Na3og1Kcsd67GWRMfWm',
    type: 'message',
    role: 'assistant',
    model: 'claude-opus-4-20250514',
    content: [ [Object] ],
    stop_reason: 'tool_use',
    stop_sequence: null,
    usage: {
  ⎿  API Error: Request timed out.

To be clear that's the full JSON at the end, it doesn't print the rest of the JSON after the API Error line

thomasj02 avatar Jun 03 '25 20:06 thomasj02

I've the last version available with claude update, using with max account:

Image

payoff avatar Jun 04 '25 10:06 payoff

@thomasj02 we're still looking into this, and may push an updated version with some additional logging. The logs you shared are really interesting, in that they don't show any failure. I'm guessing this is related to the streaming being interrupted, since the initial response looks OK.

ant-kurt avatar Jun 05 '25 01:06 ant-kurt

Also have the same issue on 1.0.15 and 1.0.11

Image

w4sspr avatar Jun 06 '25 01:06 w4sspr

Possibly related (not specific to this activity):

Image

smat-dev avatar Jun 06 '25 03:06 smat-dev

Having the same issue.

Occasionally ~200 tokens will go through, then it outputs "[ERROR] Error streaming, falling back to non-streaming mode: Request timed out." and "Request timed out". Never recovers.

Just subbed an hour ago and am trying to /init Claude:

Image

Edit - this is about as far as it gets before going offline:

Image

JeremyTCD avatar Jun 06 '25 07:06 JeremyTCD

FYI, the issue seems to "heal" when I walk away and come back an hour later: it's good for another request or two, before going offline again:

[ERROR] Error streaming, falling back to non-streaming mode: Request timed out.
  ⎿  API Error (Request timed out.) · Retrying in 1 seconds… (attempt 1/10)
...
✻ Doing… (131s · ↑ 0 tokens · esc to interrupt · offline)
...
 ? for shortcuts                                                   Debug mode
                                                                  14169 tokens

The issue #1771 describes the same symptoms, claims to be a different use case: that Claude Code works "locally" but fails in a container. (I am running in a container. I assume everyone reporting this bug runs in a container, but I dunno... no one is mentioning this.)

In my case, I attempted to check account limits directly using curl. I got back an error: "Your credit balance is too low to access the Anthropic API." which can't be right as this is a week-old, unused max account. However, I may have used the wrong API key when checking; I am confused about which API key is actually being used. First use of Claude Code sets itself up automatically, without telling me which keys it selected. Is it possible that this happens whenever Claude Code auto-attaches to the wrong API keys?

linas avatar Jun 08 '25 02:06 linas

Further evidence that this is account-related, rather than npm related:

  • Logging out and logging back in does not reset the system state.
  • Creating a brand new container with a fresh npm install of claude code times out on the very first try.
  • Creating a brand new container, and installing an older version npm install -g @anthropic-ai/[email protected] times out on the second chat interaction. (It also installs the newest version 1.0.17 in the background)

And yet...

  • Working with the container that auto-upgraded itself, and restarting claude (without logging out) allows me to chit-chat with it for 7 exchanges. On the 8th exchange, I gave it a real task: to look at this specific bug report. It fetched it, analyzed it, was reporting on what it found, and half-way through that report, it timed out.
  • The other two containers continue to be unresponsive.

Network issues? The above prompted me to take a look at the firewall.

  • I am coming under heavy attack from the develooper.com domain, which is splattering me with ICMP redirects from a bunch of subdomains it controls.
  • As of just over a week, I am also under fairly intense ssh password-guessing attempts.
  • The usual stream of martian packets, no upticks, downticks.

None of the above should in any way affect claude-code, unless some attacker has managed to interpose themself between you and me.

I will attempt to collect some network traces fro claude, see if I can spot anything weird.

linas avatar Jun 09 '25 03:06 linas

A network analysis.

  • When it is not working it will send one packet of 52 bytes to 160.79.104.10:443 and receive zero response packets.
  • The 443 port is the https port. Lack of response packets implies that the negotiation for encryption on that port never even starts!! (Its been a while since I read the spec; but I thought there was always some negotiation before data started flowing)
  • That socket stays open (connected) for many many minutes, eventually closing due to timeout after maybe ten minutes.
  • This ip addr is some server admined by intellispace.net

When it's working:

  • It opens multiple connections to 160.79.104.10:443 and 34.36.57.103:443 with anywhere from 8 to 40 packets exchanged, with anywhere from 5KB to 50KB passing through.
  • Typically about 8 or 10 connections are needed to handle a single response fro claude. Including maybe 250K data. That's a surprisingly huge amount for a simple chat "claude are you there?" question.
  • The 34.36.57.103 ipaddrs is googleusercontent.com ... I guess Claude is hoted on the google cloud!?

When it stops working again:

  • Like the first time, except ... some of the still-open sockets to 160.79.104.10:443 come alive, and get a response (!!!) trade a dozen packets, 5K bytes.
  • At least one connection to 34.36.57.103 is made, a dozen packets get exchanged, for 25K bytes,
  • Despite this network activity, the UI is just printing API Error (Request timed out.) messages.

This is .. strange.

  • From the first batch of 'not working' packet traces, it suggests that the npm code is NOT to blame, because the server never ever responds.
  • From the second batch of not working packets, it seems there is some partial connectivity, just not enough to make the client happy.

But wait there's more:

  • As I type this, watching idly, I see that there are connections where the server 160.79.104.10:443 sends me a single packet of 52 bytes, and I never respond. That seemed strange, but it seems like these are working sockets, They are held open, waiting for me to talk to claude.
  • And, as I write this, claude is working again ... for one exchange, then the timeouts resume ... and now there are zero open sockets.
  • ... but now, all of a sudden, there's DNS activity every few seconds. Maybe it is trying to find alternate servers??
  • .. and now, after abut 5-10 minutes, there are at least 22 sockets that opened up.
  • .. despite this abundance of open sockets, claude thinks it's offline, and it's doing the timeout thing.

I give up. Clearly, there's a bunch of different network behaviors that are only loosely connected to what the client is doing.

linas avatar Jun 09 '25 04:06 linas

This remark: https://github.com/anthropics/claude-code/issues/1608#issuecomment-2946534081 and also this remark: https://github.com/anthropics/claude-code/issues/1608#issuecomment-2951322239 both claim that downgrading to version 1.0.6 works. I tried that just now; it does not change the behavior at all. I also tried version 1.0.3 just now, and version 1.0.7 last night. All of them behave the same way.

This suggests it's not the claude-code npm package itself, but one of two other things:

  • It's one of the dependencies. Presumably some package that claude uses for socket I/O. I hoped to try to downgrade that package, but I cannot figure out how to list npm package dependencies for claude -- they seem to be hidden or proprietary, coming up blank.
  • It's something on the server side. Perhaps mismanagement of account credentials, of authorization, or of usage limits. Of the two servers that claude uses, the googleusercontent.com server seems to always work, and that it's the intellispace.net server that is non-responsive.

linas avatar Jun 09 '25 19:06 linas

Behavior seems to be time-of-day dependent: last night (Sunday night US central timezone) allowed up to 8 exchanges before the timeouts started happening. Right now (Mid-day Monday, US central, Noon California time) and I cannot get even one single exchange to work: the very first one times out. This suggests that some server is overloaded.

linas avatar Jun 09 '25 19:06 linas

Coworkers have discovered that using a VPN works around this problem. I just now started using TOR, and can confirm that claude works over TOR. To verify that it's TOR that made a difference, I have a second session running, without TOR, in a different container, and that one times out just like always.

Here's the trick:

torsocks -q -a <ip of your tor server> -P 9050 --shell
# verify that tor is being used
curl https://check.torproject.org/ | less
# Start claude, using the TOR wrapper
claude --debug --verbose

My IP, that doesn't work: 67.198.37.16 is located in Austin TX, part of grandenetworks.net a local Texas provider.

I checked twice; both TOR exit nodes worked:

  • tor-exit-anonymizer.appliedprivacy.net (109.70.100.4) which is located in Vienna, Austria
  • 185.220.101.59 tor-exit-59.for-privacy.net (185.220.101.59) located in Berlin.

So the work-around for this issue is to use a VPN or to use TOR.

linas avatar Jun 09 '25 19:06 linas

FYI, Doing a traceroute to the non-responsive claude server shows:

 5  ae0.0.core02.smrctx.grandecom.net (24.155.121.139)  12.511 ms  12.520 ms  12.434 ms
 6  66-90-138-25.static.grandenetworks.net (66.90.138.25)  16.404 ms  22.152 ms  15.084 ms
 7  (160.79.104.10)  7.740 ms  8.282 ms  13.024 ms

FWIW, smrctx is an abbreviation for "San Marcos, Texas" about 30 miles from here. Its the location of the Grande headquarters. The last hop indicates that the intellispace.net server that is failing is directly, immediately attached to the Grande network; that is, the traceroute does not show any hops to California, as would be quite typical for hitech corps. The ping time also indicates that the server is located right here, in Texas.

Presumably, the intellispace.net server is actually just some exit node for some Anthropic VPN, so that the actual network between San Marcos and wherever the "real" claude servers are is hidden. This implies that the Anthropic VPN between here and wherever is broken.

It is also possible that the connection between my internet provider and intellispace.net is broken; i.e. that the gateway between 66-90-138-25.static.grandenetworks.net and 160.79.104.10 is misconfigured or overwhelmed. That might be the fault of my internet provider. Given that Grande is 99.99% flawless on everything else (excluding the time the repair guy showed up with a 48V power supply), this seems unlikely.

linas avatar Jun 09 '25 20:06 linas

Well, that didn't last long. Both TOR exit nodes are now failing. I actually got some serious work done for about half an hour, and now I'm dead in the water again.

... and now it works again. So it appears to be some intermittent issue when using TOR. Meanwile, the non-torified version of claude remains unusable; it will chat for 2-3 exchanges, and then timeout once it's given anything non-trivial to do.

linas avatar Jun 09 '25 20:06 linas

Why they cannot fix it ?? Its just so so bad i have exact the same issue..

michalss avatar Jun 11 '25 11:06 michalss

I've had this problem since I started using it (about one month and i stay up to date everytime), it's really very annoying. Sometimes when I ask it to resume after the retry it works and other times not at all. It's a huge problem and it seems like many of us are affected...

VincentCassiau avatar Jun 11 '25 12:06 VincentCassiau

I'm experiencing this problem with Claude Code inside a docker container.

  • On the host system (macOS) the problem does not exist.
  • There is no firewall in the container that could block requests.
  • The status line often says "offline" when, in fact, the system is online.
  • /logout from subscription (Pro) and /login with API keys did not solve the problem.
  • downgrade to v1.0.10 (released about a week ago, when the issue did not appear to exist) did not solve the problem.
  • It doesn't seem to be connected with the 5-hours window for rate limiting. I've been waiting but the timeouts came back instantly.
╭───────────────────────────────────────────────────╮
│ ✻ Welcome to Claude Code!                         │
│                                                   │
│   /help for help, /status for your current setup  │
│                                                   │
│   cwd: /app                                       │
╰───────────────────────────────────────────────────╯


 ※ Tip: Hit Enter to queue up additional messages while Claude is working.

 With the $100/mo Max plan, use Sonnet 4 as your daily driver with predictable pricing. • /upgrade to sign up


> Check the previous commit [XXXXXXXX]. Then implement system specs to describe [XXX].
  ⎿  API Error (Request timed out.) · Retrying in 1 seconds… (attempt 1/10)
  ⎿  API Error (Request timed out.) · Retrying in 1 seconds… (attempt 2/10)
  ⎿  API Error (Request timed out.) · Retrying in 2 seconds… (attempt 3/10)
  ⎿  API Error (Request timed out.) · Retrying in 5 seconds… (attempt 4/10)
  ⎿  API Error (Request timed out.) · Retrying in 10 seconds… (attempt 5/10)
  ⎿  API Error (Request timed out.) · Retrying in 17 seconds… (attempt 6/10)
  ⎿  API Error (Request timed out.) · Retrying in 32 seconds… (attempt 7/10)
  ⎿  API Error (Request timed out.) · Retrying in 33 seconds… (attempt 8/10)
  ⎿  API Error (Request timed out.) · Retrying in 32 seconds… (attempt 9/10)
  ⎿  API Error (Request timed out.) · Retrying in 34 seconds… (attempt 10/10)
  ⎿  API Error: Request timed out.

Debug output when running with ANTHROPIC_LOG=debug DEBUG=1 claude:

[log_dc8395] sending request {
  method: 'post',
  url: 'https://api.anthropic.com/v1/messages?beta=true',
  options: {
    method: 'post',
    path: '/v1/messages?beta=true',
    body: {
      model: 'claude-3-5-haiku-20241022',
      max_tokens: 512,
      messages: [Array],
      system: [Array],
      temperature: 1,
      metadata: [Object],
      stream: true
    },
    timeout: 60000,
    signal: AbortSignal { aborted: false },
    stream: true
  },
  headers: {
    accept: 'application/json',
    'anthropic-dangerous-direct-browser-access': 'true',
    'anthropic-version': '2023-06-01',
    'content-type': 'application/json',
    'user-agent': 'claude-cli/1.0.19 (external, cli)',
    'x-api-key': '***',
    'x-app': 'cli',
    'x-stainless-arch': 'arm64',
    'x-stainless-helper-method': 'stream',
    'x-stainless-lang': 'js',
    'x-stainless-os': 'Linux',
    'x-stainless-package-version': '0.51.0',
    'x-stainless-retry-count': '0',
    'x-stainless-runtime': 'node',
    'x-stainless-runtime-version': 'v18.20.8',
    'x-stainless-timeout': '60'
  }
}
[log_9fc211] connection timed out - retrying, 1 attempts remaining
[log_9fc211] connection timed out (retrying, 1 attempts remaining) {
  url: 'https://api.anthropic.com/v1/messages?beta=true',
  durationMs: 10006,
  message: 'fetch failed',
  retryOf: 'log_38fb8b'
}
```

AndreasBaumgart avatar Jun 11 '25 18:06 AndreasBaumgart

Use TOR or a VPN. Works fine for me over TOR.

linas avatar Jun 11 '25 18:06 linas

Use TOR or a VPN. Works fine for me over TOR.

Thanks @linas for the tip. I hope we can have a more permanent solution soon !

VincentCassiau avatar Jun 11 '25 20:06 VincentCassiau

The work-around of using torsocks for the entire shell is that it wrecks network connectivity for MCP. I'm currently attempting to work around this by limiting TOR use to Claude, only, by saying

torsocks -q claude

... but no matter what, I cannot run MCP, because TOR is forcing all the MCP traffic to got over TOR as well, and that does not work. (My MCP servers are not public-facing, where tor exit nodes could ever find them. Hmmm!? !!)

linas avatar Jun 12 '25 02:06 linas

Occurs on macOS 14.4.1 & ghostty. I'm using the Max plan,It doesn't seem to have anything to do with rate limiting. I'm in the same situation as everyone else. Retry "API Error (Request timed out.) · Retrying in 1 seconds… (attempt 1/10)..." multiple times. It's Japan time, but it works lightly in the mornings. Errors often occur in the evening.

$ sw_vers
ProductName:            macOS
ProductVersion:         14.4.1

$ claude -v
1.0.21 (Claude Code)

nokonoko1203 avatar Jun 12 '25 08:06 nokonoko1203

Timeouts now occur more frequently. Perhaps these issues are related.Perhaps these issues are related. #2004 #2000 #1999

nokonoko1203 avatar Jun 12 '25 12:06 nokonoko1203

me too. please fix it quickly.

githubkyo avatar Jun 12 '25 12:06 githubkyo

Still bugged in 1.0.22

VincentCassiau avatar Jun 13 '25 14:06 VincentCassiau

Still bugged. I can't work with Claude Code anymore. Before, making a 1hr pause would stop the issue for a while, but now I just restarted Claude and it gives the error right away, just after a couple of prompts. And I'm on a Max plan, far from the rate limits.

Emasoft avatar Jun 13 '25 19:06 Emasoft

still broken

ryanwkan avatar Jun 14 '25 09:06 ryanwkan

Is there a way to extend the timeout limit as a temporary workaround? I'm wasting money here for every day this issue continue...

Emasoft avatar Jun 16 '25 06:06 Emasoft

@Emasoft run it through torsocks as illustrated above. If you also need MCP, you can tunnel that out by writing MCP traffic to a unix domain socket, and then on the other side of that, pushing it to your tcpip server. The tunneling is needed, as otherwise torsocks blocks your local network connections. I can give you scripts to do that or you can ask claude to make them for you. The one's I'm using to work around the MCP blocking are here: https://github.com/opencog/cogserver/tree/master/examples/mcp -- the two *py scripts.

linas avatar Jun 16 '25 19:06 linas