claude-code icon indicating copy to clipboard operation
claude-code copied to clipboard

[BUG] API Error (Connection error.) with TypeError (fetch failed)

Open pm0code opened this issue 3 months ago • 9 comments

Preflight Checklist

  • [x] I have searched existing issues and this hasn't been reported yet
  • [x] This is a single bug report (please file separate reports for different bugs)
  • [x] I am using the latest version of Claude Code

What's Wrong?

When making calls to the Claude API, the system is failing with a connection error. The underlying error is reported as a TypeError (fetch failed). The client-side SDK attempts to retry the request with an exponential backoff strategy, but the connection continues to fail across multiple attempts.

Steps to Reproduce Start Claude from scratch or execute a script or application that makes a request to the Claude API.

The issue appears to be intermittent and may be related to network instability or a client-side configuration issue.

Observe the console or log output for API errors.

Expected Behavior The API request should complete successfully on the first attempt or succeed after one or two retries if a transient network issue occurs.

Actual Behavior The API call fails repeatedly, triggering the retry mechanism. Each attempt logs an API Error (Connection error.) followed by a TypeError (fetch failed). The issue persists for at least five attempts, with the retry delay increasing each time.

Environment Details OS: [Please specify your operating system, e.g., macOS Sonoma, Windows 11, Ubuntu 22.04]

Runtime: [Please specify your runtime, e.g., Node.js v20.11.0, Python 3.10]

SDK/Library: [Please specify the library and version, e.g., @anthropic-ai/sdk v0.20.8]

What Should Happen?

claude should operate normally

Error Messages/Logs

Error Logs
text
  ⎿  API Error (Connection error.) · Retrying in 4 seconds… (attempt 4/10)
    ⎿  TypeError (fetch failed)
  ⎿  API Error (Connection error.) · Retrying in 8 seconds… (attempt 5/10)
    ⎿  TypeError (fetch failed)

Steps to Reproduce

just start the clause from the console

Claude Model

None

Is this a regression?

Yes, this worked in a previous version

Last Working Version

No response

Claude Code Version

Current version: 1.0.120

Platform

Anthropic API

Operating System

macOS

Terminal/Shell

Terminal.app (macOS)

Additional Information

No response

pm0code avatar Sep 22 '25 21:09 pm0code

Same issue here and the issue persisted until now. Nobody is looking into it despite reported like 6 hours ago?

viper7882 avatar Sep 23 '25 02:09 viper7882

My experience is that when Claude fixed Claude App and API connectivity errors, certain users are required to re-anthentic their account but the API error didn't prompt that. The workaround is to run claude /login

viper7882 avatar Sep 23 '25 08:09 viper7882

The issue in my case is that it is a random behavior. It works sometimes, and then all of a sudden it does not, and regardless of what trick/workaround I apply, including the above, it remains the same until, say, an hour later ?!?

pm0code avatar Sep 24 '25 19:09 pm0code

Same here on windows

MarcusJellinghaus avatar Sep 26 '25 13:09 MarcusJellinghaus

Same here on mac M1

nntruongmuse avatar Oct 19 '25 08:10 nntruongmuse

Installing claude through npm instead of nix has solved this issue for me, bug report in nixpks

alvaroaleman avatar Oct 20 '25 18:10 alvaroaleman

Installing claude through npm instead of nix has solved this issue for me, bug report in nixpks

Thank you, I installed it in docker, using npm RUN npm install -g @anthropic-ai/claude-code

nntruongmuse avatar Oct 22 '25 09:10 nntruongmuse

This issue has been inactive for 30 days. If the issue is still occurring, please comment to let us know. Otherwise, this issue will be automatically closed in 30 days for housekeeping purposes.

github-actions[bot] avatar Dec 08 '25 10:12 github-actions[bot]

I've been battling this for the last few hours. A summary of what seems to have fixed it (summary by Claude, obvs):

Problem: Claude Code SDK timeouts (10 seconds) when attempting IPv6 connections to api.anthropic.com, even though IPv4 connections succeed in 2-4 seconds.

Solution: 1. Kernel-level IPv6 Disable Applied via: Docker docker run command or docker-compose.yml --sysctl net.ipv6.conf.all.disable_ipv6=1

2. Force IPv4-only DNS Servers Applied via: Docker DNS configuration --dns 8.8.8.8 --dns 8.8.4.4

3. Node.js IPv4 Preference Applied via: Environment variable -e NODE_OPTIONS=--dns-result-order=ipv4first

Why all three are needed:

  • Layer 1 (kernel): Disables IPv6 at OS level but doesn't prevent DNS from resolving IPv6 addresses
  • Layer 2 (DNS): Uses Google's IPv4-only DNS servers to prevent IPv6 address resolution entirely
  • Layer 3 (Node.js): Forces Node.js runtime to prefer IPv4 addresses in case any IPv6 addresses leak through

Result: Eliminates intermittent SDK timeouts by ensuring IPv4 is used at every network stack layer.

Context: Applies to any Docker containerized Node.js application experiencing intermittent connection timeouts when IPv6 is unavailable or unreliable.


Additionally, even though I started to see much improvement, it got worse as the yolo development continued:

Symptoms Observed

Pattern in Build Logs:

  • Build starts with successful API requests (1-16): ✅ All succeed in 1-4 seconds
  • Intermittent failures begin (17-41): ⚠️ Mix of successes and timeouts
  • Complete failure (42-81): ❌ ALL requests timeout at exactly ~10 seconds
  • Total: 33 successes, 48 consecutive timeouts in a single fresh container

Timeout Characteristics:

  • Consistent 10-second timeout (10308ms, 10486ms, 10415ms, etc.)
  • Error: fetch failed / connection timed out - error; no more retries left
  • Pattern repeats on every build, regardless of fresh container restarts

Why This Pointed to WSL2 Port Exhaustion

Initial False Leads (Ruled Out):

  1. ❌ Connection pool exhaustion in Node.js - Fixed shared Anthropic client, but issue persisted
  2. ❌ Connection pool in Claude CLI - Fresh container didn't help
  3. ❌ Firewall blocking - Firewall script was disabled (devcontainer.json:23-24)
  4. ❌ IPv6 timeouts - Already fixed with kernel/DNS/Node.js IPv4-only configuration
  5. ❌ DNS resolution - api.anthropic.com resolves correctly to 160.79.104.10
  6. ❌ Cloudflare IP rotation - Curl tests succeeded (405 = API reachable)
  7. ❌ Linux conntrack limits - 262k limit with only 150 entries used

The Smoking Gun:

On WSL2 host

$ cat /proc/sys/net/ipv4/ip_local_port_range 60700 61000

Only 300 ephemeral ports available (61000 - 60700 = 300)

Why This Causes the Pattern:

  1. Port Exhaustion Math: - Each API request = 1 outbound connection through WSL2 NAT - Closed connections remain in TIME-WAIT state for 60 seconds - With 300 ports available:

    • ~40 requests at 2-3s each = 80-120 seconds of requests
    • First ~15 requests complete before TIME-WAIT accumulates
    • After that, ports start getting exhausted
    • By request #42, all 300 ports are in TIME-WAIT → complete failure
  2. Connection Flow (Why WSL Port Range Matters): Container Process (Claude CLI) ↓ container port (ephemeral from container's 32768-60999) Docker Bridge NAT ↓ host port (ephemeral from WSL's 60700-61000) ← BOTTLENECK WSL2 Network Stack ↓ Windows Host ↓ Internet (api.anthropic.com)

  3. The WSL2 → Windows NAT is the bottleneck, not container → Docker NAT.

  4. Why Fresh Containers Don't Help: - Port exhaustion is at the WSL kernel level, not container level - Containers share the same WSL kernel and its port range - Restarting container doesn't clear WSL's TIME-WAIT table

The WSL2 Issue Explained

Normal Linux Behavior:

  • Standard ephemeral port range: 32768-60999 (28,232 ports)
  • Supports thousands of concurrent connections
  • TIME-WAIT connections (60s) don't cause issues with this many ports

WSL2 Default Behavior (Pre-Fix):

  • Windows default range: 49152-65536 (16,384 ports) - reasonable
  • But somehow WSL kernel had: 60700-61000 (300 ports) - catastrophic
  • This may be due to:
    • WSL2 inheriting a restricted Windows configuration
    • Previous network configuration changes
    • WSL2 version-specific bug/default

Impact:

  • With only 300 ports, after ~40-50 requests all ports are exhausted
  • New connections fail immediately (connection refused)
  • Anthropic SDK waits 10 seconds then times out
  • Pattern: Works fine → gradual degradation → complete failure

The Fix Applied

Step 1: Increase Windows Ephemeral Port Range

Executed in PowerShell (Administrator):

Before (insufficient):

Start Port : 49152 Number of Ports : 16384

Applied fix:

netsh int ipv4 set dynamicport tcp start=32768 numberofports=28232

After (standard Linux range):

Start Port : 32768 Number of Ports : 28232

Why These Numbers:

  • Start: 32768 - Standard Linux/IANA ephemeral port range start
  • Count: 28232 - Gives range 32768-61000 (standard Linux range)
  • Matches Docker container defaults for consistency

What to Do After WSL Restart

  1. Restart WSL (already done when you restarted your terminal/WSL):

Windows PowerShell

wsl --shutdown

Then restart WSL by opening terminal or running: wsl

  1. Verify Fix is Applied:

Inside WSL - should now show 32768-61000 (or similar wide range)

cat /proc/sys/net/ipv4/ip_local_port_range

Expected output: 32768 61000

Restarted the service that runs the docker image with CC CLI on it - now working perfectly.

ap1969 avatar Dec 09 '25 23:12 ap1969