gloo icon indicating copy to clipboard operation
gloo copied to clipboard

Support building with OpenSSL 3.x

Open geofft opened this issue 1 year ago • 2 comments

OpenSSL 1.x reaches end-of-life in September, and recent distros like Ubuntu 22.04+ (last year) and Debian 12+ (next month) ship only OpenSSL 3.

I have gloo (inside PyTorch) working with OpenSSL 3.x as far as I can tell everything works fine. The APIs it uses are both API- and ABI-compatible between 1.1 and 3.x. (This is important because PyTorch configures gloo with USE_TCP_OPENSSL_LOAD, i.e., it dlopens the library instead of compiling against it.) But there are a few things to adjust:

  1. In #306 cmake does find_package(OpenSSL 1.1 REQUIRED EXACT), which fails out on 3.0. Something like find_package(OpenSSL 1.1...<4.0 REQUIRED) would be better. Alternatively, perhaps this shouldn't be invoked at all in the USE_TCP_OPENSSL_LOAD case, since OpenSSL isn't needed at build time then?
  2. gloo/transport/tcp/tls/openssl.cc attempts to dlopen libssl.so, if present, else libssl.so.1.1. The first library is only available if the development package for OpenSSL is installed. And the development package can be any version (3.x, 4.x, etc.) It's probably safer to make this libssl.so.1.1 + libssl.so.3 (all 3.x uses the same soname).

If a PR is helpful I can do the CLA dance but hopefully this is simple enough that the more interesting thing is agreeing on what the change is.

geofft avatar May 08 '23 23:05 geofft

OpenSSL 1.1.x is EOL on 2023-09-11.

@xunnanxu could you take a look please?

thesamesam avatar Sep 05 '23 11:09 thesamesam

Seems pretty reasonable to me. That said I'm probably not the exact right person for review. Maybe consider opening a linked issue in Pytorch code based and tag it with oncall: distributed to make sure this gets properly reviewed?

xunnanxu avatar Sep 06 '23 04:09 xunnanxu