openssl ssl: support IO-like object as the underlying transport

An implementation of #731. The test suite passes on my box, but it needs more testing especially around the error handling.

This adds support for IO-like object that is not backed by a file descriptor by defining a BIO_METHOD to wrap the following Ruby methods.

#read_nonblock(len, exception: false)
#write_nonblock(str, exception: false)
#wait_readable
#wait_writable
#flush
#closed?
#close

Mar 27 '24 15:03 rhenium

@rhenium anything I can help with to move this forward?

May 17 '24 15:05 HoneyryderChuck

@rhenium friendly ping

Aug 20 '24 07:08 HoneyryderChuck

I just pushed what I have in my local branch so far, rebased on top of current master to resolve merge conflicts. This is not ready to merge yet.

Some unresolved issues:

We have many configurable callbacks that may be triggered by various methods on SSLSocket. If a callback attempts a tag jump (e.g., raising an exception), it is caught by rb_protect() and the jump tag is stored in the ID_callback_state ivar. After returning from the OpenSSL function, the ivar is checked and the jump is resumed. Currently, if a callback fails, no other callbacks will be executed from the same OpenSSL function call (unless I'm mistaken), so this is working as expected.

A jump from some callbacks is considered critical to the connection and a TLS alert must be sent to the peer. For example, an exception in SSLContext#servername_cb should cause an unrecognized_name alert to be sent to the client.

Since writing to the underlying socket requires running Ruby code, this means we need to execute Ruby code while a jump is still on hold. This PR currently doesn't try to handle this, and it's possible to cause a segfault (as you can see in the two failing test cases I added in a [DO NOT MERGE] commit). How should this be fixed?

I initially tried saving rb_errinfo() temporarily and restoring later with rb_set_errinfo(), but this can't work for all cases because rb_set_errinfo() only accepts exceptions.
If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?
OpenSSL suppresses a certain kind of errors when writing TLS 1.3 NewSessionTicket to the underlying socket, when it is harmless (https://github.com/openssl/openssl/blob/e599893a9fec932701ca824d73a794a0c9ce02e9/ssl/statem/statem_srvr.c#L852-L870). While we could rescue exceptions like Errno::ECONNRESET raised by underlying_socket.write_nonblock to replicate the behavior, I'm not sure if that would be a good idea. This PR currently doesn't handle this. This is a slight difference compared to when using the socket BIO.

Mar 05 '25 16:03 rhenium

This PR currently doesn't try to handle this, and it's possible to cause a segfault

Indeed, I didn't know that doing that was dangerous, though that rb_protect could deal with it.

One option could be to assume that the IO-like object must indeed be an SSLSocket, and one could deal with it by setting an ivar "flag" telling the internal implementation to skip 2nd-level rb_protect/rb_jump_tag? Other than that, I'm out of ideas.

If both a callback and the underlying socket does a jump in the same single OpenSSL function call

Can they happen at the same time? Perhaps suggestion above deals with it?

An alternative approach could be to go back to the drawing board and try again the approach using BIO_mem. I think this approach could be more pragmatic, at the cost of having to do a lot of "callback to rubyland", just not sure at this point if the entension API allows that.

Mar 06 '25 23:03 HoneyryderChuck

Can they happen at the same time?

Please see test_synthetic_io_errors_in_callback_and_socket for an example. SSLContext#servername_cb is triggered when the server receives a ClientHello. If the callback doesn't succeed, the server then has to send an unrecognized_name alert. Both happen inside the same SSL_accept() function call.

I think the segfault is fixable by (ab)using rb_ensure(), but I wasn't sure about in which way I should make it work:

If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?

Mar 10 '25 13:03 rhenium

Sorry for not replying sooner, have been stumped with other chores. I think "last exception wins" is reasonable, and considering this is a new feature, it's not really breaking compatibility.

Lmk when you work around the jump tag issue, I can run a few sanity checks on my side as well.

Mar 22 '25 12:03 HoneyryderChuck

This branch needs more polishing, but it should be working aside from some edge cases. I'd appreciate if you could test it.

I should have worded more accurately, but the last commit changed it so that the later tag jump (which includes raising an exception) wins against the former one. This should fix the segfault.

One concern is that this can accidentally suppress an important jump like the one created by Thread#kill. Please see this Ruby issue for details: https://bugs.ruby-lang.org/issues/13882

I haven't come up with a nice solution for this.

Mar 30 '25 15:03 rhenium

just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).

Apr 08 '25 13:04 HoneyryderChuck

just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).

That Gemfile snippet works as expected for me (ruby-build Ruby 3.4 on Linux). I'm not sure why it's failing for you. What happens if you clone the repository, run gem build to make a .gem archive, and try installing it?

Apr 08 '25 16:04 rhenium

I did some effort to build openssl from branch in httpx CI, and now I'm getting this error a bit all over the place:

FaradayTest#test_adapter_get_handles_compression:
RangeError: integer 139928495023720 too big to convert to 'int'
    /ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::SSL::SSLSocket#syswrite_nonblock'
    /ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::Buffering#write_nonblock'
    lib/httpx/io/tcp.rb:129:in 'HTTPX::TCP#write'

I may only be able to do some investigation in one week.

Apr 16 '25 13:04 HoneyryderChuck

RangeError: integer 139928495023720 too big to convert to 'int'

I was able to reproduce it locally. This is a bug in #syswrite{,_nonblock} with a String-convertible object that responds to #to_str (such as HTTPX::Buffer). #881 should fix this.

Apr 16 '25 15:04 rhenium

thx, tests are passing now 👍

Apr 16 '25 22:04 HoneyryderChuck

Just spent the day scratching my head over Net::HTTP's behavior around proxies and didn't see the tickets that led to this PR until now lol Anything I can do to help here?

May 14 '25 00:05 liath