oathkeeper icon indicating copy to clipboard operation
oathkeeper copied to clipboard

feat: Make 429 passthrough instead return 401

Open sboily opened this issue 6 months ago • 5 comments

Fix: Properly propagate 429 Too Many Requests responses instead of converting to 401

This pull request fixes an issue where Oathkeeper was incorrectly converting 429 (Too Many Requests) responses from upstream services into 401 (Unauthorized) errors. This behavior is problematic for API consumers as it conflates rate limiting with authentication failures, leading to incorrect client behavior.

The Problem

When an upstream service returns a 429 status code (rate limit exceeded), Oathkeeper was treating it as an authentication failure and returning 401. This causes several issues:

  1. Incorrect client behavior: Clients receive 401 and attempt to re-authenticate instead of backing off
  2. Loss of rate limit information: Important headers like Retry-After and rate limit details are lost
  3. Poor developer experience: Developers see authentication errors when the real issue is rate limiting
  4. Breaks HTTP semantics: 401 and 429 have completely different meanings in the HTTP specification

Changes Made

  1. Modified error handling in errors_hydrator.go: Updated the DefaultHydrator to properly recognize and propagate 429 status codes from upstream responses instead of converting them to 401.

  2. Preserved upstream response details: The fix ensures that:

    • The original 429 status code is maintained
    • Rate limiting headers (e.g., Retry-After, X-RateLimit-*) are preserved
    • The detailed error response body from the upstream service is passed through to the client

Impact

This change improves API usability by:

  • Allowing clients to properly distinguish between authentication failures (401) and rate limiting (429)
  • Enabling correct retry behavior with exponential backoff for rate limits
  • Preserving detailed rate limit information for better debugging
  • Following HTTP standards and REST API best practices

Example

Before this fix:

{
  "error": {
    "code": 401,
    "status": "Unauthorized",
    "message": "Access credentials are invalid"
  }
}

After this fix:

{
  "error": {
    "code": 429,
    "status": "Too Many Requests",
    "reason": "{\"detail\":{\"error\":\"rate_limit_exceeded\",\"message\":\"API rate limit exceeded. You have made 71 requests in the current minute, which exceeds your limit of 70 requests per minute.\",\"limit_type\":\"minute\",\"current_usage\":71,\"limit\":70,\"retry_after_seconds\":5,\"reset_time\":1751085840,\"documentation\":\"https://docs.angany.ai/api/rate-limiting\"}}",
    "message": "Too many requests"
  }
}

No breaking changes are introduced by this patch.

Related issue(s)

This fixes a previously unknown bug where 429 responses were being incorrectly converted to 401 errors. The bug can be reproduced by:

  1. Setting up Oathkeeper as a reverse proxy to an upstream service that implements rate limiting
  2. Having the upstream service return a 429 status code when rate limits are exceeded
  3. Observing that Oathkeeper returns a 401 (Unauthorized) error instead of propagating the 429

Checklist

  • [x] I have read the contributing guidelines.
  • [ ] I have referenced an issue containing the design document if my change introduces a new feature.
  • [x] I am following the contributing code guidelines.
  • [x] I have read the security policy.
  • [x] I confirm that this pull request does not address a security vulnerability.
  • [ ] I have added tests that prove my fix is effective or that my feature works.
  • [ ] I have added or changed the documentation.

Further Comments

This fix addresses a significant usability issue for developers building applications that consume APIs through Oathkeeper. Converting rate limit errors to authentication errors violates the principle of least surprise and makes it impossible for clients to implement proper retry logic.

The fix has been tested in a dev environment with high-throughput rate limiting scenarios (100+ requests/second) and correctly propagates 429 responses without any performance impact.

This change aligns Oathkeeper with HTTP standards (RFC 6585) and common API gateway practices where rate limiting and authentication are treated as distinct concerns with different error codes.

This PR is intent to have discussion about this issue, let me know if it incorrect to address it like this. There is also an issue filled here https://github.com/ory/oathkeeper/issues/1167.

sboily avatar Jun 28 '25 05:06 sboily

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 28 '25 05:06 CLAassistant

@wewelll after you approved the changes, what is the next step to move this forward? I would really like to see this fixed in Oathkeeper.

finsterwalder avatar Sep 30 '25 17:09 finsterwalder

@finsterwalder i think i need to check the new tests required because they failed. But it's not clear to me if maintainers are interested about this PR.

sboily avatar Oct 01 '25 22:10 sboily

@sboily Thank you so much for you continuous effort to work on this! hackerman clearly stated, that he thinks your approach is correct, so I think it's likely that it will be merged, once the checks are green. Unfortunately I can't help, since I have no experience with Go. 😬

finsterwalder avatar Oct 02 '25 07:10 finsterwalder

@sboily Thank you so much for you continuous effort to work on this! hackerman clearly stated, that he thinks your approach is correct, so I think it's likely that it will be merged, once the checks are green. Unfortunately I can't help, since I have no experience with Go. 😬

CI is green but i dont change my code 😅, i have just sync main.

sboily avatar Oct 18 '25 18:10 sboily