ondemand icon indicating copy to clipboard operation
ondemand copied to clipboard

3.1.9 seems to break the Shell app due to CSRF token issue

Open jjackzhn opened this issue 1 year ago • 25 comments

After updating from 3.1.7 to 3.1.9, the shell app immediately crashes with the message:

Your connection to the remote server has been terminated.

Looking at the logs I found

App 4042244 output: Listening on 3000
App 4042244 output: Connection established
App 4042244 output: /var/www/ood/apps/sys/shell/app.js:161
App 4042244 output:   token = req.url.match(/csrf=([^&]*)/)[1];
App 4042244 output:                                        ^
App 4042244 output:
App 4042244 output: TypeError: Cannot read properties of null (reading '1')
App 4042244 output:     at WebSocketServer.connection (/var/www/ood/apps/sys/shell/app.js:161:40)
App 4042244 output:     at WebSocketServer.emit (node:events:517:28)
App 4042244 output:     at done (/var/www/ood/apps/sys/shell/app.js:233:9)
App 4042244 output:     at WebSocketServer.completeUpgrade (/var/www/ood/apps/sys/shell/node_modules/ws/lib/websocket-server.js:435:5)
App 4042244 output:     at WebSocketServer.handleUpgrade (/var/www/ood/apps/sys/shell/node_modules/ws/lib/websocket-server.js:343:10)
App 4042244 output:     at Server.upgrade (/var/www/ood/apps/sys/shell/app.js:232:7)
App 4042244 output:     at Server.emit (node:events:517:28)
App 4042244 output:     at onParserExecuteCommon (node:_http_server:939:14)
App 4042244 output:     at onParserExecute (node:_http_server:825:3) 

I made it print out req.url and it's the bare URL to the shell app without a CSRF token.

3.1.7 does not have this issue since downgrading fixes it.

jjackzhn avatar Oct 22 '24 19:10 jjackzhn

I can't replicate. Note that it's the javascript making this network call with a CSRF token, not the original URL you put in your browser.

Here's an image of my console tab for the same. You'll note that the first request is what you enter in the URL (and the response is an html page), then later the browser makes another request to basically the same URL only with wss protocol instead of http.

image

johrstrom avatar Oct 22 '24 19:10 johrstrom

Didn't mean to close the ticket, only comment on it. Can you show a similar network tab?

I double checked the diff between the two versions this morning and can't find/see how the ping pong change changed/broke functionality to that point.

I'd also ask if you can try 3.1.9 with a private window or after clearing your cache to see if that's any different.

https://github.com/OSC/ondemand/compare/v3.1.7...v3.1.9

johrstrom avatar Oct 22 '24 19:10 johrstrom

Given the changelog, I'd pin this on updates we had to make to our lua code to support the latest version of httpd (or apache2 as the case may be). What apache2/httpd version are you using?

johrstrom avatar Oct 22 '24 20:10 johrstrom

Here's a screenshot - I do see both the original request and the follow-up wss request. ood-csrf

I'm seeing the same behavior with incognito after clearing cache and doing a hard reload.

We are using httpd 2.4.37 from RHEL 8. I'm viewing it with Chrome on Windows 10 22H2.

jjackzhn avatar Oct 22 '24 21:10 jjackzhn

Quite a few reds and oranges in the console... ood-console

Edit: The websocket error is the only one that doesn't show up in 3.1.7. Detail: ood-error

jjackzhn avatar Oct 22 '24 21:10 jjackzhn

Unfortunately I can't replicate on Rocky 8.8. Even 8.8 has httpd-2.4.37-65. I'll try to track down a system that has httpd-2.4.37-56, but I'm guessing the fixes we put in place for 2.4.62 somehow broke httpd-2.4.37-56 (assuming that's the version you're on).

I'll also have to look into the difference between those 56 and 65 releases.

johrstrom avatar Oct 23 '24 13:10 johrstrom

Yes, I have 2.4.37-56. I'll also try to find something different to test on.

jjackzhn avatar Oct 23 '24 14:10 jjackzhn

Is it possible to update httpd? I'm unable to get an OS with that version. Everything on appstream has been updated.

I assume you're on some EL/8.8 variant?

johrstrom avatar Oct 23 '24 14:10 johrstrom

@johrstrom You can enable the Vault for Rocky to get older packages, ie: https://dl.rockylinux.org/vault/rocky/8.8/AppStream/x86_64/os/Packages/h/. Has https://dl.rockylinux.org/vault/rocky/8.8/AppStream/x86_64/os/Packages/h/httpd-2.4.37-56.module%2Bel8.8.0%2B1456%2Bd0a01c5e.7.x86_64.rpm.

You can inject this URL into something like /etc/yum.repos.d/Rocky-AppStream.repo and comment out mirrorlist and set the Vault URL at https://dl.rockylinux.org/vault/rocky/8.8/AppStream/x86_64/os/ to the value for baseurl.

treydock avatar Oct 23 '24 14:10 treydock

@johrstrom You can enable the Vault for Rocky to get older packages

Thanks, I'll take a look at it.

johrstrom avatar Oct 23 '24 14:10 johrstrom

OK I can replicate if I downgrade back to 2.4.37-56. It is infact coming from the lua updates we made to support 2.4.62...

johrstrom avatar Oct 23 '24 15:10 johrstrom

But that brings another question - If I have to install the version from the vault should we even support it? Is there a way to safely require httpd >= 2.4.37-56? The patch numbers could vary from distro to distro in theory, but do they in practice?

johrstrom avatar Oct 23 '24 15:10 johrstrom

Latest versions for EL8:

Rocky 8 - 2.4.37-65 Alma 8 - 2.4.37-65 RHEL 8 - 2.4.37-65

The checksum and other parts of the appstream release do differ but I think for RHEL8 (and 9 most likely) we could require a minimum release based on what works with OnDemand.

Where this could run into issues is folks on RHEL 8 can pin to like RHEL 8.8 and if we were to require something from RHEL 8.10, that would break for them but also if that requirement is based on what works for OnDemand, they have no choice but to upgrade.

If 2.4.37-56 is the issue I think we'd have to say:

Requires: > 2.4.37-56 < 2.5

https://github.com/OSC/ondemand-packaging/blob/f777543e6818c47063ab1c1f4287493b32b634cb/packages/ondemand-runtime/rpm/ondemand-runtime.spec#L108

treydock avatar Oct 23 '24 15:10 treydock

Where this could run into issues is folks on RHEL 8 can pin to like RHEL 8.8 and if we were to require something from RHEL 8.10

Surprisingly enough, dnf update updates from 8.8 to 8.10. But getting a 8.8 container this morning without running dnf update installed httpd-2.4.37-65 from appstream even though it was an 8.10 module.

How can I test pinning the OS to 8.8?

image

johrstrom avatar Oct 23 '24 15:10 johrstrom

I found these 2 advisories, so I'm a bit convinced we should just require that -65 patch version. I can't imagine someone saying yes I need pin httpd to this version with known security advisories against it - and if so, then they can just stay on 3.1.7?

https://access.redhat.com/errata/RHSA-2024:5193 https://access.redhat.com/errata/RHSA-2024:4197

johrstrom avatar Oct 23 '24 18:10 johrstrom

Sounds like a plan. I'll update to RHEL 8.10.

jjackzhn avatar Oct 23 '24 18:10 jjackzhn

Sounds like a plan. I'll update to RHEL 8.10.

You don't need to fully upgrade to 8.10. Just an update to httpd should suffice.

johrstrom avatar Oct 23 '24 18:10 johrstrom

We have a mixture of RHEL 8.8 EUS, RHEL 8.10 and RHEL/Rocky 9.4; this would be a good reason to move things off 8.8 EUS anyway :)

jjackzhn avatar Oct 23 '24 18:10 jjackzhn

How can I test pinning the OS to 8.8?

This is done with subscription manager on RHEL systems as RHEL provides minor release repos. The only way to stay on a minor release on Rocky that is not the latest minor release would be to start with 8.8 container then replace all repos with 8.8 Vault URLs.

On RHEL you'd do

sudo subscription-manager release --set=8.8

Which would update the repo URLs in /etc/yum.repos.d/redhat.repo. I don't think this really works with Rocky/Alma since subscription-manager is designed to talk to RedHat systems about the system's valid subscription with RedHat.

treydock avatar Oct 23 '24 19:10 treydock

I see. Yea then a locked RHEL 8.8 system won't support 3.1.9. Hrrrrm. OK that may just be the best we can do. I can't figure out how to get the apache version in lua without shelling out a rpm -q httpd query and parsing it or similar which seems very complicated and moreover would slow down basically every request. So much so that it seems untenable.

johrstrom avatar Oct 23 '24 19:10 johrstrom

get the apache version in lua without shelling out a rpm -q httpd query and parsing it

Ya that sounds not great. The release version only exists inside the RPM metadata, I don't think Apache itself knows its 2.4.x-y it only knows it's 2.4.x.

treydock avatar Oct 23 '24 19:10 treydock

I'm not sure if it matters, but I wanted to note that I hit this bug on a Rocky 9.4 host with httpd-2.4.57-8.el9.x86_64. We have a development host also running Rocky 9.4 that updated OnDemand at the same time, but it has httpd-2.4.57-11.el9_4.1.x86_64 and did not exhibit the problem. We fixed it by downgrading the OnDemand package since that upgrade was inadvertent, but just want to make sure you knew that this problem existed on Rocky/RHEL 9 as well as 8.

mrobbert avatar Oct 24 '24 20:10 mrobbert

Rocky 9.4 host with httpd-2.4.57-8.el9.x86_64

It certainly does matter because that means specifying Requires: > 2.4.37-56 < 2.5 isn't a silver bullet I thought it may be.

johrstrom avatar Oct 24 '24 20:10 johrstrom

It certainly does matter because that means specifying Requires: > 2.4.37-56 < 2.5 isn't a silver bullet I thought it may be.

We can have RHEL8 and RHEL9 specific requirements. We already do that for things like NodeJS because of different versions available:

https://github.com/OSC/ondemand-packaging/blob/8abf784f09b65b1392651fd0694155d62b8d0e29/packages/ondemand-runtime/rpm/ondemand-runtime.spec#L88-L96

My guess is whatever patch RedHat backported, it was applied to 2.4.37 for RHEL8 and 2.4.57 for RHEL9 as RHEL tends to pin their software versions though less so with AppStream

treydock avatar Oct 24 '24 21:10 treydock

So we could do

%if 0%{?rhel} == 8
Requires: %{apache} > 2.4.37-56, %{apache} < 2.5
...
%endif
%if 0%{?rhel} == 9
Requires: %{apache} > 2.4.57-8, %{apache} < 2.5
...
%endif

treydock avatar Oct 24 '24 21:10 treydock

I am having this issue, with the following installation:

httpd-2.4.37-65.module+el8.10.0+1830+22f0c9e0.x86_64

$ cat /etc/redhat-release Rocky Linux release 8.10 (Green Obsidian)

App 2203442 output: /var/www/ood/apps/sys/shell/app.js:161 App 2203442 output: token = req.url.match(/csrf=([^&]*)/)[1]; App 2203442 output: ^ App 2203442 output: App 2203442 output: TypeError: Cannot read properties of null (reading '1') App 2203442 output: at WebSocketServer.connection (/var/www/ood/apps/sys/shell/app.js:161:40) App 2203442 output: at WebSocketServer.emit (node:events:517:28) App 2203442 output: at done (/var/www/ood/apps/sys/shell/app.js:233:9) App 2203442 output: at WebSocketServer.completeUpgrade (/var/www/ood/apps/sys/shell/node_modules/ws/lib/websocket-server.js:435:5) App 2203442 output: at WebSocketServer.handleUpgrade (/var/www/ood/apps/sys/shell/node_modules/ws/lib/websocket-server.js:343:10) App 2203442 output: at Server.upgrade (/var/www/ood/apps/sys/shell/app.js:232:7) App 2203442 output: at Server.emit (node:events:517:28) App 2203442 output: at onParserExecuteCommon (node:_http_server:939:14) App 2203442 output: at onParserExecute (node:_http_server:825:3) App 2203442 output: App 2203442 output: Node.js v18.20.2

Xaraxia avatar Nov 07 '24 02:11 Xaraxia

From the browser it looks fine:

token = window.urlWithToken().match(/csrf=([^&]*)/)[1]; 'AVPiWh3i-OEzDDpVlDgafakedatafakedata' window.urlWithToken() 'wss://bunya-ondemand.rcc.uq.edu.au/pun/sys/shell/ssh/bunya.rcc.uq.edu.au?csrf=AVPiWh3i-OEzDDpVlDgafakedatafakedata'

So, potentially, something is preventing the token from being passed back to nodejs or otherwise mangling that URL in such a way that it is not being detected correctly.

Xaraxia avatar Nov 07 '24 05:11 Xaraxia

So, potentially, something is preventing the token from being passed back to nodejs or otherwise mangling that URL in such a way that it is not being detected correctly.

The update to apache should have resolved this... Did you bounce apache after updating to ensure that the new version is running? That's the only thing I can think of. I know for sure this issue is caused by the 3.1.9 update using a version of apache that's lower than httpd-2.4.37-65 (what you've reported to have).

johrstrom avatar Nov 07 '24 14:11 johrstrom

Yes, and I just did a systemctl restart again to be on the safe side.

11:40 sbw@bunya-ondemand:~$ rpm -qa | grep httpd httpd-filesystem-2.4.37-65.module+el8.10.0+1830+22f0c9e0.noarch httpd-devel-2.4.37-65.module+el8.10.0+1830+22f0c9e0.x86_64 httpd-2.4.37-65.module+el8.10.0+1830+22f0c9e0.x86_64 httpd-tools-2.4.37-65.module+el8.10.0+1830+22f0c9e0.x86_64 rocky-logos-httpd-86.3-1.el8.noarch

11:41 sbw@bunya-ondemand:~$ sudo /usr/sbin/httpd -v Server version: Apache/2.4.37 (Rocky Linux) Server built: Jul 1 2024 14:01:13

Xaraxia avatar Nov 12 '24 01:11 Xaraxia

I'm so sorry, but I can't replicate our systems or in a container with Rocky 8. httpd-2.4.37-65 (at least in a container when I test) should work well for you.

The updates we made (that apparently broke older httpds) were to support higher versions of httpd. So we can't really go back on them/revert them. I'm not really sure what to do because it really should be working for you, I'm just not sure why. You haven't edited those lua files have you? Just trying to cover what else could be the issue.

johrstrom avatar Nov 12 '24 14:11 johrstrom