ondemand icon indicating copy to clipboard operation
ondemand copied to clipboard

nginx problems with RHEL 7.9 under FIPS.

Open bviviano opened this issue 3 years ago • 17 comments

Hello, I am trying to get OnDemand 2.0.28 running on my older RHEL7.9 based cluster (It's working fine on my new RHEL8.6 based cluster). I am running into an issue with nginx when trying to view "My Interactive Sessions", after trying to start a desktop on a compute node:

App 28611 output: md5_dgst.c(82): OpenSSL internal error, assertion failed: Digest MD5 forbidden in FIPS mode!
[ W 2022-08-16 07:01:23.0621 25834/T10 age/Cor/Con/InternalUtils.cpp:96 ]: [Client 15-1] Sending 502 response: application did not send a complete response
[ W 2022-08-16 07:01:25.0854 25834/T3 age/Cor/App/Poo/AnalyticsCollection.cpp:102 ]: Process (pid=28611, group=/var/www/ood/apps/sys/dashboard (production)) no longer exists! Detaching it from the pool.
[ N 2022-08-16 07:01:25.0855 25834/T3 age/Cor/CoreMain.cpp:1147 ]: Checking whether to disconnect long-running connections for process 28611, application /var/www/ood/apps/sys/dashboard (production)

We're required to run our systems in FIPS 140-2 mode (fips=1 on /proc/cmdline). I've run into issues similar to this on RHEL7 as OpenSSL 1.0.2 doesn't correctly work with MD5 digest/checkum, OpenSSL 1.0.2 just globally disables MD5 related functions when in FIPS mode, even if its just being used to calculate a checksum. OpenSSL 1.1.1 under RHEL8 is more understanding.

Before I start tracing through the OnDemand code, I was wondering if someone could point me at where in the code the MD5 call(s) are being done. I have other HPC tools like xCAT that do similar things and I've had some luck/experience making adhoc code tweaks to change from using MD5 to SHA256 (which is FIPS compliant), so I wanted to see if I could do something similar in this case and get OnDemand working on my older RHEL7 cluster.

Of course, if there is a directive/option in ood_portal.yml that I missed to handle this, please let me know. But I don't see any mention of FIPS in the online documentation.

Thanks.

┆Issue is synchronized with this Asana task by Unito

bviviano avatar Aug 16 '22 11:08 bviviano

After some research, I believe the issue is coming from nginx itself, the core problem being its linked with OpenSSL 1.0.2. I have OpenSSL 1.1.1 available from EPEL installed on my system.

I am thinking if I can rebuild ondemand-nginx from source into a replacement RPM, but link against the OpenSSL 1.1.1 release, that might solve my problem.

I am unable to locate the source package information for ondemand-nginx-1.18.0-2.p6.0.14 anywhere on the OSC GitHub sites. Is there information someplace that I can download/review for how the ondemand-nginx package is being built?

Thanks.

bviviano avatar Aug 16 '22 12:08 bviviano

I'm not sure if it is from nginx.

Can you uncomment this line in /var/www/ood/apps/sys/dashboard/config/initializers/new_framework_defaults_5_2.rb and see if that changes anything?

https://github.com/OSC/ondemand/blob/ca681fd38d5c19b8e10bb9b80f1d144faccf9b90/apps/dashboard/config/initializers/new_framework_defaults_5_2.rb#L35

Unfortunately 2.0 doesn't have a config.load_defaults so I can't tell what the defaults being loaded are.

johrstrom avatar Aug 16 '22 14:08 johrstrom

I am unable to locate the source package information for ondemand-nginx-1.18.0-2.p6.0.14 anywhere on the OSC GitHub sites. Is there information someplace that I can download/review for how the ondemand-nginx package is being built?

SPEC file: https://github.com/OSC/ondemand-packaging/blob/2.0/packages/passenger/rpm/passenger.spec SRPM: https://yum.osc.edu/ondemand/2.0/web/el7/SRPMS/ondemand-passenger-6.0.14-1.el7.src.rpm

(EDIT to fix SRPM URL)

The build process of NGINX is handled by Passenger so you might be limited in your ability to force a different OpenSSL.

treydock avatar Aug 16 '22 14:08 treydock

@johrstrom Making that change did not help. I will see what I can do with getting nginx compiled with a newer OpenSSL. Will update this ticket if I make any progress. Thanks.

bviviano avatar Aug 16 '22 16:08 bviviano

@treydock Do you happen to have the required rpmbuild command to rebuild ondemand-passender?

I am getting

$ rpmbuild --rebuild ondemand-passenger-6.0.14-1.el7.src.rpm 
Installing ondemand-passenger-6.0.14-1.el7.src.rpm
error: line 3: Unknown tag: %scl_package passenger

I expect there are some additional command line options I need or macro's that need to be defined someplace specific to building the RPM package.

Thanks.

bviviano avatar Aug 16 '22 16:08 bviviano

You have to build with SCL build package scl-utils-build installed.

It's going to be rather difficult to replicate our build steps using raw rpmbuild command, we abstract all the build steps using ondemand-packaging repo. https://github.com/OSC/ondemand-packaging/tree/2.0

If you have a Ruby environment and Docker (or Podman) you can build using our build tools.

git clone --single-branch --branch 2.0 https://github.com/OSC/ondemand-packaging.git
cd ondemand-packaging
# make changes to packages/passenger/rpm/passenger.spec
bundle install --path vendor/bundle
bundle exec rake ood_packaging:package:passenger[el7]

treydock avatar Aug 16 '22 16:08 treydock

@treydock Thanks. I was able to rpmbuild --rebuild the ondemand-passenger-6.0.14-1.el7.src.rpm package after I installed ondemand-scldevel and ondemand-build. I made the changes to the passenger.spec file to get nginx to link with OpenSSL 1.1.1k:

[root@atmos6 ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux Server release 7.9 (Maipo)

[root@atmos6 ~]# /opt/ood/ondemand/root/usr/sbin/nginx -V
nginx version: nginx/1.18.0
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) 
built with OpenSSL 1.1.1k  FIPS 25 Mar 2021
TLS SNI support enabled
configure arguments: --prefix=/opt/rh/ondemand/root/usr/share/nginx --sbin-path=/opt/rh/ondemand/root/usr/sbin/nginx --conf-path=/opt/rh/ondemand/root/etc/nginx/nginx.conf --error-log-path=/var/log/ondemand-nginx/error.log --http-log-path=/var/log/ondemand-nginx/access.log --http-client-body-temp-path=/var/lib/ondemand-nginx/tmp/client_body --http-proxy-temp-path=/var/lib/ondemand-nginx/tmp/proxy --http-fastcgi-temp-path=/var/lib/ondemand-nginx/tmp/fastcgi --http-uwsgi-temp-path=/var/lib/ondemand-nginx/tmp/uwsgi --http-scgi-temp-path=/var/lib/ondemand-nginx/tmp/scgi --pid-path=/run/ondemand-nginx.pid --lock-path=/run/lock/subsys/ondemand-nginx --user=ondemand-nginx --group=ondemand-nginx --with-file-aio --with-http_ssl_module --with-http_v2_module --with-http_realip_module --with-http_addition_module --with-http_xslt_module --with-http_image_filter_module --with-http_sub_module --with-http_dav_module --with-http_flv_module --with-http_mp4_module --with-http_gunzip_module --with-http_gzip_static_module --with-http_random_index_module --with-http_secure_link_module --with-http_degradation_module --with-http_stub_status_module --with-mail --with-mail_ssl_module --with-pcre --with-pcre-jit --add-module=../src/nginx_module --with-debug --with-cc-opt='-I /usr/include/openssl11 -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -grecord-gcc-switches -m64 -mtune=generic' --with-ld-opt='-Wl,-z,relro -Wl,-E'

But I still get the same error message in the log

App 17902 output: md5_dgst.c(82): OpenSSL internal error, assertion failed: Digest MD5 forbidden in FIPS mode!
[ W 2022-08-16 14:12:31.2892 16513/Ti age/Cor/Con/InternalUtils.cpp:96 ]: [Client 6-1] Sending 502 response: application did not send a complete response
[ W 2022-08-16 14:12:35.0865 16513/T3 age/Cor/App/Poo/AnalyticsCollection.cpp:102 ]: Process (pid=17902, group=/var/www/ood/apps/sys/dashboard (production)) no longer exists! Detaching it from the pool.
[ N 2022-08-16 14:12:35.0865 16513/T3 age/Cor/CoreMain.cpp:1147 ]: Checking whether to disconnect long-running connections for process 17902, application /var/www/ood/apps/sys/dashboard (production)

If you have any other suggestions about where/what I might try to get past this, I'd appreciate it.

My fall back would be to see if I could deploy the entire SCL setup with OnDemand into a RHEL8 Docker container and run it on my RHEL7 system.

bviviano avatar Aug 16 '22 18:08 bviviano

@bviviano be sure to bounce your PUN after you edited that /var/www/ood file. If you can't access the button directly, you can hit this path on your server to bounce it --- /nginx/stop?redir=/pun/sys/dashboard.

I'm fairly sure this out of the Rails dashboard because the App 17902 output: suggests that pid 17902 is the rails app (and that's it's output). and 502 response: application did not send a complete response means nginx sent the request but the app (the rails dashboard) itself did not send a complete response. I.e., it crashed.

johrstrom avatar Aug 16 '22 18:08 johrstrom

@johrstrom Thanks. I did restart the nginx process from Restart Web Server and verified on the system running nginx it got a new PID. So, I know it did restart correctly.

I also used your method of /nginx/stop?redir=/pun/sys/dashboard with the same results. So, if its in the Ruby code and not nginx, I'll have to try and chase down there and see why maybe the

Rails.application.config.active_support.use_sha1_digests = true 

setting didn't take/work as expected. SHA1 is valid on FIPS

[root@atmos6 ~]# fipscheck 
fips mode is on

[root@atmos6 ~]# openssl md5 /dev/null
Error setting digest md5
47669739415952:error:060800A3:digital envelope routines:EVP_DigestInit_ex:disabled for fips:digest.c:256:

[root@atmos6 ~]# openssl sha1 /dev/null
SHA1(/dev/null)= da39a3ee5e6b4b0d3255bfef95601890afd80709

bviviano avatar Aug 16 '22 18:08 bviviano

Sorry I'm just now seeing the active_support which we don't really use. Let me see where else this could be set...

johrstrom avatar Aug 16 '22 18:08 johrstrom

Is there a way to turn up the debugging output so I could more easily locate where in the code the problem is happening?

You said the App 20523 entry, that 20523 was the PID from the dashboard, so is there a way to log output telling me what command/process PID 20523 actually was? I checked and once I hit the error, whatever PID is logged in the /var/log/ondemand-nginx/USER/error.log is no longer running.

Thanks.

bviviano avatar Aug 16 '22 18:08 bviviano

OK - looks like we ourselves are using this library so that could be the issue as well.

apps/dashboard/app/models/batch_connect/session.rb:      Digest::MD5.hexdigest(hsh.to_json)
apps/dashboard/app/views/batch_connect/sessions/connections/_native_vnc.html.erb:<% localport = Digest::MD5.hexdigest("#{connect.host}:#{connect.port}").to_i(16) % 55535 + 10000 %>

this file - apps/dashboard/app/models/batch_connect/session.rb would be /var/www/ood/apps/dashboard/app/models/batch_connect/session.rb. You should be able to replace MD5 with SHA1 directly so you could make a sed command to find and replace them.

johrstrom avatar Aug 16 '22 18:08 johrstrom

@johrstrom Thanks. That was what I had first thought, but there are a lot of places the Digest::MD5 class gets included based on a quick find/grep.

Tomorrow I'll poke through the code and see if I can make any progress.

bviviano avatar Aug 16 '22 18:08 bviviano

@johrstrom using find/grep and then sed, I made the following changes under /var/www/ood

/bin/sed -i 's/Digest::MD5/Digest::SHA1/g' /var/www/ood/apps/sys/dashboard/app/models/batch_connect/session.rb
/bin/sed -i 's/Digest::MD5/Digest::SHA1/g' /var/www/ood/apps/sys/dashboard/app/views/batch_connect/sessions/connections/_native_vnc.html.erb

With these changes, I am able to get to My Interactive Sessions on the dashboard and launch VNC to get an XFCE desktop on a compute node no problem, like I can on my RHEL 8.6 cluster.

There are also some places under /opt/ood that use Digest::MD5. I'm going to ignore http_authentication.rb, since we're using DEX, and you said OnDemand isn't using activesupport, but do any of the following files need to be adjusted? I don't want to just go through wholesale and replace all instances of Digest::MD5 with Digest::SHA1. Law of unintended consequences and all :).

/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/activestorage-5.2.8.1/app/models/active_storage/blob.rb:      Digest::MD5.new.tap do |checksum|
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/activestorage-5.2.8.1/lib/active_storage/service/disk_service.rb:        unless Digest::MD5.file(path_for(key)).base64digest == checksum
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/dalli-3.2.0/lib/dalli/key_manager.rb:      digest_class: ::Digest::MD5
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/mini_portile2-2.6.1/lib/mini_portile2/mini_portile.rb:        when exp=file[:md5] then Digest::MD5
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/mini_portile2-2.6.1/test/test_digest.rb:    download_with_digest(:md5, Digest::MD5)
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/mini_portile2-2.6.1/test/test_download.rb:      Digest::MD5.file(dest).hexdigest.must_equal "5deffb997041bbb5f11bdcafdbb47975"
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/rack-2.2.4/lib/rack/auth/digest/md5.rb:          ::Digest::MD5.hexdigest(data)
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/rack-2.2.4/lib/rack/auth/digest/nonce.rb:          ::Digest::MD5.hexdigest("#{@timestamp}:#{self.class.private_key}")
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/rack-test-2.0.2/lib/rack/test/mock_digest_request.rb:        Rack::Auth::Digest::MD5.new(nil).send :digest, self, password
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/sass-listen-4.0.0/lib/sass-listen/file.rb:      md5 = Digest::MD5.file(path).digest
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/sprockets-3.7.2/lib/sprockets/digest_utils.rb:      16 => Digest::MD5,
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/thor-0.19.1/lib/thor/runner.rb:      :filename   => Digest::MD5.hexdigest(name + as),
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/thor-0.19.1/spec/runner_spec.rb:        path = File.join(Thor::Util.thor_root, Digest::MD5.hexdigest(@location + "random"))
/opt/ood/ondemand/root/usr/share/gems/2.7/ondemand/2.0.28/gems/websocket-driver-0.7.5/lib/websocket/driver/draft76.rb:        Digest::MD5.digest((@key_values + [head]).pack('N2A*'))

Thanks for your assistance.

bviviano avatar Aug 17 '22 11:08 bviviano

I'd leave well enough alone if it works for you. If it comes back up - I'd look at maybe rack, but other than that those are testing, CLI and building gems. Which is to say, rack is the only real runtime gem I see there, besides activestorage which we don't use.

johrstrom avatar Aug 17 '22 13:08 johrstrom

Ok. Thanks. Yeah, everything appears to be working correctly for VNC and JupyterLab (which are the two main apps we have deployed in our OnDemand setup). So, I think we're good. Again, appreciate the assistance.

bviviano avatar Aug 17 '22 13:08 bviviano

Thanks, I'm going to leave this issue open to actually replace MD5 with SHA1 here upstream, so you don't have to continually make edits.

johrstrom avatar Aug 17 '22 13:08 johrstrom