saml2aws icon indicating copy to clipboard operation
saml2aws copied to clipboard

OneLogin returns 500s for seemingly no cause

Open apeschel opened this issue 2 years ago • 2 comments

Using saml2aws 2.36.4.

We're having issues with users authenticating to OneLogin. What's strange is that these users only seem to have issues with saml2aws, and are able to authenticate to OneLogin via other tools with no issues.

We have a user in Columbia who gets a 500 error every time that he tries to connect to OneLogin using saml2aws. He has the exact same set up and configs as the rest of us, but none of us consistently get a 500 error.

Two days ago, I started receiving 500 errors when I attempted log in. After spending two days debugging the issue, it just started working again today with no explanation.

I've tried debugging the code, and as far as I can tell, saml2aws is sending the right headers and JSON object to the OneLogin Rest API. The OneLogin server just responds with a 500, with no explanation as to why.

I strongly suspect there is some kind of backend issue with OneLogin, but I cannot figure out what is going on exactly, and I'm hoping someone here might have some insight. I also have an open issue with OneLogin's support to try to get some insight from them as to what's going on in their backend.

I've provided the output from running saml2aws with DUMP_OUTPUT enabled, but with all secrets stripped, to try to provide as much info as possible as to what is going on here.

DEBU[0000] building provider                             command=login idpAccount="account {\n  AppID: APP_ID\n  Subdomain: ORG\n  URL: https://ORG.onelogin.com\n  Username: USERNAME\n  Provider: OneLogin\n  MFA: Auto\n  SkipVerify: false\n  AmazonWebservicesURN: urn:amazon:webservices\n  SessionDuration: 43200\n  Profile: PROFILE\n  RoleARN: ROLE_ARN\n  Region: REGION\n}"
Authenticating as USERNAME ...
DEBU[0000] Generating OneLogin access token              provider=OneLogin
POST /auth/oauth2/v2/token HTTP/1.1
Host: ORG.onelogin.com
User-Agent: saml2aws/1.0 (darwin amd64) Versent
Content-Length: 35
Accept: application/json
Authorization: Basic AUTH_TOKEN
Content-Type: application/json
Accept-Encoding: gzip

{"grant_type":"client_credentials"}
HTTP/1.1 200 OK
Transfer-Encoding: chunked
Cache-Control: max-age=0, private, must-revalidate
Connection: keep-alive
Content-Type: application/json; charset=utf-8
Date: Tue, 21 Mar 2023 21:09:13 GMT
Etag: W/"49692b40934e0742679308ce0516efa2"
Set-Cookie: ol_oapi_canary_15=false; path=/; domain=.onelogin.com; HttpOnly; Secure
Status: 200 OK
Strict-Transport-Security: max-age=63072000; includeSubDomains;
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-Request-Id: 641A1CF9-0A0905DD-99C8-0A0905F4-24E3-67AD67-655C
X-Runtime: 0.028808
X-Xss-Protection: 1; mode=block

10b
{"access_token":"ACCESS_TOKEN","created_at":"2023-03-21T13:10:53.231Z","expires_in":36000,"refresh_token":"REFRESH_TOKEN","token_type":"bearer","account_id":ACCOUNT_ID}
0


DEBU[0001] Retrieved OneLogin OAuth token:ACCESS_TOKEN  provider=OneLogin
DEBU[0001] Requesting SAML Assertion                     provider=OneLogin
POST /api/2/saml_assertion HTTP/1.1
Host: ORG.onelogin.com
User-Agent: saml2aws/1.0 (darwin amd64) Versent
Content-Length: 147
Accept: application/json
Authorization: bearer: ACCESS_TOKEN
Content-Type: application/json
Accept-Encoding: gzip

{"app_id":"APP_ID","password":"PASSWORD","subdomain":"ORG","username_or_email":"USERNAME"}

HTTP/1.1 500 Internal Server Error
Content-Length: 948
Cache-Control: no-cache
Connection: keep-alive
Content-Security-Policy: frame-ancestors 'none';
Content-Type: text/html; charset=utf-8
Date: Tue, 21 Mar 2023 21:09:13 GMT
P3p: CP="CAO DSP COR CURa ADMa DEVa OUR IND PHY ONL UNI COM NAV INT DEM PRE"
Status: 500 Internal Server Error
Strict-Transport-Security: max-age=63072000; includeSubDomains;
X-Content-Type-Options: nosniff
X-Frame-Options: DENY
X-Request-Id: 641A1CF9-0A0905DD-300E-0A09014A-24E3-67ADD2-165B
X-Xss-Protection: 1; mode=block

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
  <title>We're sorry, but something went wrong (500)</title>
        <style type="text/css">
                body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }
                div.dialog {
                        width: 25em;
                        padding: 0 4em;
                        margin: 4em auto 0 auto;
                        border: 1px solid #ccc;
                        border-right-color: #999;
                        border-bottom-color: #999;
                }
                h1 { font-size: 100%; color: #f00; line-height: 1.5em; }
        </style>
</head>

<body>
  <!-- This file lives in public/500.html -->
  <div class="dialog">
    <h1>We're sorry, but something went wrong.</h1>
    <p>We've been notified about this issue and we'll take a look at it shortly.</p>
  </div>
</body>
</html>

DEBU[0001] SAML Assertion response code:500              provider=OneLogin
DEBU[0001] SAML Assertion response body:<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
       "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
  <meta http-equiv="content-type" content="text/html; charset=UTF-8" />
  <title>We're sorry, but something went wrong (500)</title>
        <style type="text/css">
                body { background-color: #fff; color: #666; text-align: center; font-family: arial, sans-serif; }
                div.dialog {
                        width: 25em;
                        padding: 0 4em;
                        margin: 4em auto 0 auto;
                        border: 1px solid #ccc;
                        border-right-color: #999;
                        border-bottom-color: #999;
                }
                h1 { font-size: 100%; color: #f00; line-height: 1.5em; }
        </style>
</head>

<body>
  <!-- This file lives in public/500.html -->
  <div class="dialog">
    <h1>We're sorry, but something went wrong.</h1>
    <p>We've been notified about this issue and we'll take a look at it shortly.</p>
  </div>
</body>
</html>  provider=OneLogin
HTTP 500:
Error authenticating to IdP.
github.com/versent/saml2aws/v2/cmd/saml2aws/commands.Login
        github.com/versent/saml2aws/v2/cmd/saml2aws/commands/login.go:107
main.main
        github.com/versent/saml2aws/v2/cmd/saml2aws/main.go:190
runtime.main
        runtime/proc.go:250
runtime.goexit
        runtime/asm_amd64.s:1598

> There was a problem!
> Please check that you've copied the saml2aws config file to '~/.saml2aws'
> and that file contains a '[PROFILE]' section.

> For debugging, try:
saml2aws --verbose login -a PROFILE

apeschel avatar Mar 22 '23 17:03 apeschel

Hey @apeschel,

I happen to be painfully familiar with what you're describing.

We have a long running case where users in Montreal have this issue.

It's easier to explain with this map I drew for our internal incident tracking this:

image

Both v1 and v2 endpoints have similar but slightly different problems where this map was for v1 but the same premise roughly applies for v2 in that some of the OLP endpoints in that region are seemingly hosed but only via the API.

From memory, the browser login uses their data centre in SF which is why it works fine.

These regional names are inferred (but are probably accurate) based on the rough locations of AWS's data centres plus OneLogin appears to have to have their own on-prem data centre in San Francisco. I'm pretty sure of the latter since all traffic was routed there when they had their sizeable outage a few months back.

I can write up some more once I'm in the office but you should look into whether your coworker in Columbia is accessing via a VPN that egresses near us-east-2.

If you've got access to multiple VPN configurations, try picking one that egresses near that area and see if you get the same issue too.

OneLogin are seemingly aware of it given I've got an open ticket and they've confirmed that it's a thing but I don't have any details on the internal issue, just a lot of stuff I've inferred from many hours of banging my head against a wall, hah

marcus-crane avatar Mar 22 '23 19:03 marcus-crane

@marcus-crane Thanks for your response - I tested your hypothesis as you requested, and it seems to hold true.

I used tailscale to set up a VPN with exit nodes in us-west-1, us-east-1, and sa-east-1 (Sao Paulo). I consistently received 500 errors When using us-west-1 or sa-east-1, but was consistently able to authenticate when using us-west-1.

I had our user from columbia connect using us-west-1, and he was able to authenticate with no issues as well.

I'm not sure what, if anything, can be done on the saml2aws side to fix this.

apeschel avatar Mar 23 '23 23:03 apeschel