ssl: support IO-like object as the underlying transport
An implementation of #731. The test suite passes on my box, but it needs more testing especially around the error handling.
This adds support for IO-like object that is not backed by a file descriptor by defining a BIO_METHOD to wrap the following Ruby methods.
- #read_nonblock(len, exception: false)
- #write_nonblock(str, exception: false)
- #wait_readable
- #wait_writable
- #flush
- #closed?
- #close
@rhenium anything I can help with to move this forward?
@rhenium friendly ping
I just pushed what I have in my local branch so far, rebased on top of current master to resolve merge conflicts. This is not ready to merge yet.
Some unresolved issues:
-
We have many configurable callbacks that may be triggered by various methods on
SSLSocket. If a callback attempts a tag jump (e.g., raising an exception), it is caught byrb_protect()and the jump tag is stored in theID_callback_stateivar. After returning from the OpenSSL function, the ivar is checked and the jump is resumed. Currently, if a callback fails, no other callbacks will be executed from the same OpenSSL function call (unless I'm mistaken), so this is working as expected.A jump from some callbacks is considered critical to the connection and a TLS alert must be sent to the peer. For example, an exception in
SSLContext#servername_cbshould cause anunrecognized_namealert to be sent to the client.Since writing to the underlying socket requires running Ruby code, this means we need to execute Ruby code while a jump is still on hold. This PR currently doesn't try to handle this, and it's possible to cause a segfault (as you can see in the two failing test cases I added in a
[DO NOT MERGE]commit). How should this be fixed?I initially tried saving
rb_errinfo()temporarily and restoring later withrb_set_errinfo(), but this can't work for all cases becauserb_set_errinfo()only accepts exceptions. -
If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?
-
OpenSSL suppresses a certain kind of errors when writing TLS 1.3
NewSessionTicketto the underlying socket, when it is harmless (https://github.com/openssl/openssl/blob/e599893a9fec932701ca824d73a794a0c9ce02e9/ssl/statem/statem_srvr.c#L852-L870). While we could rescue exceptions likeErrno::ECONNRESETraised byunderlying_socket.write_nonblockto replicate the behavior, I'm not sure if that would be a good idea. This PR currently doesn't handle this. This is a slight difference compared to when using the socket BIO.
This PR currently doesn't try to handle this, and it's possible to cause a segfault
Indeed, I didn't know that doing that was dangerous, though that rb_protect could deal with it.
One option could be to assume that the IO-like object must indeed be an SSLSocket, and one could deal with it by setting an ivar "flag" telling the internal implementation to skip 2nd-level rb_protect/rb_jump_tag? Other than that, I'm out of ideas.
If both a callback and the underlying socket does a jump in the same single OpenSSL function call
Can they happen at the same time? Perhaps suggestion above deals with it?
An alternative approach could be to go back to the drawing board and try again the approach using BIO_mem. I think this approach could be more pragmatic, at the cost of having to do a lot of "callback to rubyland", just not sure at this point if the entension API allows that.
Can they happen at the same time?
Please see test_synthetic_io_errors_in_callback_and_socket for an example. SSLContext#servername_cb is triggered when the server receives a ClientHello. If the callback doesn't succeed, the server then has to send an unrecognized_name alert. Both happen inside the same SSL_accept() function call.
I think the segfault is fixable by (ab)using rb_ensure(), but I wasn't sure about in which way I should make it work:
If both a callback and the underlying socket does a jump in the same single OpenSSL function call, which one should take precedence?
Sorry for not replying sooner, have been stumped with other chores. I think "last exception wins" is reasonable, and considering this is a new feature, it's not really breaking compatibility.
Lmk when you work around the jump tag issue, I can run a few sanity checks on my side as well.
This branch needs more polishing, but it should be working aside from some edge cases. I'd appreciate if you could test it.
I should have worded more accurately, but the last commit changed it so that the later tag jump (which includes raising an exception) wins against the former one. This should fix the segfault.
One concern is that this can accidentally suppress an important jump like the one created by Thread#kill. Please see this Ruby issue for details: https://bugs.ruby-lang.org/issues/13882
I haven't come up with a nice solution for this.
just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).
just a heads-up that I've been trying to incorporate this in my branch's CI, but having issues due to C extension compilation and openssl being an stdlib I guess (opened a ticket here, in case you know why that happens and can recommend a workaround).
That Gemfile snippet works as expected for me (ruby-build Ruby 3.4 on Linux). I'm not sure why it's failing for you. What happens if you clone the repository, run gem build to make a .gem archive, and try installing it?
I did some effort to build openssl from branch in httpx CI, and now I'm getting this error a bit all over the place:
FaradayTest#test_adapter_get_handles_compression:
RangeError: integer 139928495023720 too big to convert to 'int'
/ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::SSL::SSLSocket#syswrite_nonblock'
/ruby-openssl/lib/openssl/buffering.rb:433:in 'OpenSSL::Buffering#write_nonblock'
lib/httpx/io/tcp.rb:129:in 'HTTPX::TCP#write'
I may only be able to do some investigation in one week.
RangeError: integer 139928495023720 too big to convert to 'int'
I was able to reproduce it locally. This is a bug in #syswrite{,_nonblock} with a String-convertible object that responds to #to_str (such as HTTPX::Buffer). #881 should fix this.
thx, tests are passing now 👍
Just spent the day scratching my head over Net::HTTP's behavior around proxies and didn't see the tickets that led to this PR until now lol Anything I can do to help here?