graylog2-server icon indicating copy to clipboard operation
graylog2-server copied to clipboard

UI unusable after 5.1 > 5.2 upgrade with telemetry enabled & `oauth2-proxy` in front

Open der-eismann opened this issue 1 year ago • 4 comments

Expected Behavior

We have been using Graylog for years with oauth2-proxy in front to make use of SSO with the Trusted Headers method. So expected behavior is that users get logged in and can actually use Graylog.

Current Behavior

After upgrading to 5.2, many people started getting an HTTP 400 error, with no content being displayed at all. Turning off the proxy worked, but meant we had to use passwords again. Interestingly we noticed that it was usable for some people with proxy when disabling telemetry (like @hydrapolic mentioned). Now this is probably an issue with https://github.com/oauth2-proxy/oauth2-proxy, nevertheless Graylog UI shouldn't start failing because the telemetry cookie is malformatted or similar.

Possible Solution

Don't panic when the posthog cookie is malformatted.

Steps to Reproduce (for bugs)

  1. Run Graylog with oauth2-proxy in front (this is very specific, not sure if I can easily create a config to reproduce it)
  2. Start experiencing errors as long as telemetry is enabled

Context

We were trying to use Graylog.

Your Environment

  • Graylog Version: 5.2.3
  • Java Version: Eclipse Adoptium 17.0.9
  • OpenSearch Version: 2.11.0
  • MongoDB Version: 5.0.14
  • Operating System: Official container
  • Browser version: Chrome 121.0.6167.85

der-eismann avatar Jan 26 '24 11:01 der-eismann

Hey @der-eismann,

thanks for reporting this! Where do users see the HTTP 400? Is there anything in your browser's console when trying to open the web interface?

dennisoelkers avatar Jan 26 '24 11:01 dennisoelkers

The error appeared on every page. When removing the posthog cookie and reloading, the page was loading until at some point it suddenly started failing: screenshot-20240125-162606

Then when pressing F5, there's just the Chrome error page (different URL here because we reproduced it in a separate environment to debug it) screenshot-20240126-121340

In the console there's just this when removing the cookie and reloading

GET https://graylog.production.example.com/api/contentstream/tags 400 (Bad Request)

and after pressing F5 again (second screenshot) there's this

Failed to load resource: the server responded with a status of 400 ()

der-eismann avatar Jan 26 '24 11:01 der-eismann

After some more debugging I still have no idea why it fails with the proxy. I debug printed all cookies and compared them with and without oauth2-proxy, there's no difference.

Also it's a mystery to me why the telemetry_enabled option isn't documented anywhere. I guess you guys don't want people to disable it, but if it helps as a workaround, it's too hidden.

der-eismann avatar Jan 26 '24 12:01 der-eismann

Hey @der-eismann,

thanks for digging into this. Is there any response body for the requests failing with 400s?

You are right, telemetry_enabled should be documented. Obviously, we want people to be safe to use it (both technically and related to privacy) so there is little reason to opt out, but there are always situations like this where it should be present and known as a last resort if things go sideways otherwise.

dennisoelkers avatar Jan 29 '24 07:01 dennisoelkers