UI unusable after 5.1 > 5.2 upgrade with telemetry enabled & `oauth2-proxy` in front
Expected Behavior
We have been using Graylog for years with oauth2-proxy in front to make use of SSO with the Trusted Headers method. So expected behavior is that users get logged in and can actually use Graylog.
Current Behavior
After upgrading to 5.2, many people started getting an HTTP 400 error, with no content being displayed at all. Turning off the proxy worked, but meant we had to use passwords again. Interestingly we noticed that it was usable for some people with proxy when disabling telemetry (like @hydrapolic mentioned). Now this is probably an issue with https://github.com/oauth2-proxy/oauth2-proxy, nevertheless Graylog UI shouldn't start failing because the telemetry cookie is malformatted or similar.
Possible Solution
Don't panic when the posthog cookie is malformatted.
Steps to Reproduce (for bugs)
- Run Graylog with
oauth2-proxyin front (this is very specific, not sure if I can easily create a config to reproduce it) - Start experiencing errors as long as telemetry is enabled
Context
We were trying to use Graylog.
Your Environment
- Graylog Version: 5.2.3
- Java Version: Eclipse Adoptium 17.0.9
- OpenSearch Version: 2.11.0
- MongoDB Version: 5.0.14
- Operating System: Official container
- Browser version: Chrome 121.0.6167.85
Hey @der-eismann,
thanks for reporting this! Where do users see the HTTP 400? Is there anything in your browser's console when trying to open the web interface?
The error appeared on every page. When removing the posthog cookie and reloading, the page was loading until at some point it suddenly started failing:
Then when pressing F5, there's just the Chrome error page (different URL here because we reproduced it in a separate environment to debug it)
In the console there's just this when removing the cookie and reloading
GET https://graylog.production.example.com/api/contentstream/tags 400 (Bad Request)
and after pressing F5 again (second screenshot) there's this
Failed to load resource: the server responded with a status of 400 ()
After some more debugging I still have no idea why it fails with the proxy. I debug printed all cookies and compared them with and without oauth2-proxy, there's no difference.
Also it's a mystery to me why the telemetry_enabled option isn't documented anywhere. I guess you guys don't want people to disable it, but if it helps as a workaround, it's too hidden.
Hey @der-eismann,
thanks for digging into this. Is there any response body for the requests failing with 400s?
You are right, telemetry_enabled should be documented. Obviously, we want people to be safe to use it (both technically and related to privacy) so there is little reason to opt out, but there are always situations like this where it should be present and known as a last resort if things go sideways otherwise.