libcoro
libcoro copied to clipboard
.poll(coro::poll_op::read)ing from ssl connection is extremely slow
This is my code now: https://gist.github.com/niansa/b2d589e1ed67c44e753414298f30404c#file-gistfile1-txt-L67
For some reason using poll() is extremely slow here and I have to catch EAGAIN.
BTW: I figured out that each poll(coro::poll_op::read) takes exactly as long as my configured timeout! With timeout set to 0 it gets stuck ~~forever~~ for a very long time.
Would could have gone wrong?
Here is a repository with full code: https://github.com/niansa/libcoroExperiments Timeout gets set in main.cpp
The first bytes get transfered as they should.
Setting the timeout to 1 causes:
free(): double free detected in tcache 2
Aborted (core dumped)
or even:
Segmentation fault (core dumped)
I am trying to pull a stack trace but it just exits with code 0 when I try to run it with GDB
Omitting the timeout means it will wait for data forever, e.g. no timeout. That should generally be documented on the poll function.
It looks like you are calling recv prior to checking the poll status result, you shouldn't immediately call that if you get an EAGAIN since it means there is no data currently on the socket to read. (edit: All sockets used through a coro::io_scheduler are always put into non-blocking mode)
auto presult = co_await client.poll(read, timeout);
if presult == event
{
// call recv
}
else
{
// handle poll return result error or timeout
}
I changed it to this:
// Receive response
std::vector<char> response(256);
while (true) {
auto pres = co_await client.poll(coro::poll_op::read, timeout);
// Check if poll succeeded
if (pres == coro::poll_status::event) {
// Receive
auto [recv_status, recv_bytes] = client.recv(response);
// Handle error in receive
if (recv_status != coro::net::recv_status::ok || !(co_await cb(recv_bytes))) {
co_return recv_status == coro::net::recv_status::closed; // <---
}
} else {
co_return pres != coro::poll_status::closed;
}
}
But it just returns in the line I am pointing at because of read() returning EAGAIN even tho I am checking the result of the poll now.
How long is your timeout, I can't tell front the prior snippet? Are you getting any data at all or is it exiting immediately? EAGAIN should just mean there is no data current available to read.
Running curl to verify it's responding externally wouldn't hurt either to make sure it's setup correctly for whatever your using to serve data back to this client.
I'd also recommend checking the status returns from connect and ssl_handshake, things could be going wrong there too!
ssl_handshake returns successfully. I am checking that.
Current timeout is set to 5 seconds, and I am getting no data at all. Strace shows me that read() returns EAGAIN. Should I just check for EAGAIN and if that occurs just poll again?
A 5 second timeout seems plenty long for any non-overloaded server to return in time, have you verified you can curl the same url successfully?
Probably kind of hard for me to guess at what else is going wrong TBH, but considering all the tests for the project I'm inclined to think it is something with the setup in some fashion. Can you possibly drop SSL/TLS to try and get it working without that layer?
A 5 second timeout seems plenty long for any non-overloaded server to return in time,
Alright, I changed default value to 0.
have you verified you can curl the same url successfully?
Yep and it works just as it should
Can you possibly drop SSL/TLS to try and get it working without that layer?
Yep, it works without SSL
I updated my repository with my current code, feel free to mess with it yourself:
https://github.com/niansa/libcoroExperiments
Looks pretty straightforward, I'll give it a go in a bit!
Ok, I looked over my tests for this and your code and managed to get it to work with SSL with a single change:
while (true)
{
std::cerr << "client.poll(read)\n";
pstatus = co_await client.poll(coro::poll_op::read);
REQUIRE(pstatus == coro::poll_status::event);
std::cerr << "client.recv()\n";
auto [rstatus, rspan] = client.recv(response);
if (rstatus == coro::net::recv_status::would_block)
{
std::cerr << coro::net::to_string(rstatus) << "\n";
continue;
}
This is the test, its got an explicit "would block" check and then it loops, ssl seems to add another read ready event even though there is no data available yet, so if you add a similar check and loop through poll -> recv you'll get the data you expect. I managed to get your program to work by just adding this.
I updated the repository with the changes as you said. It's still broken for me.
Maybe you could take a quick look at the newest commit? What did I do wrong?
$ ./libcoroExperiments
HTTP/1.1 200 OK
Accept-Ranges: bytes
Age: 348592
Cache-Control: max-age=604800
Content-Type: text/html; charset=UTF-8
Date: Mon, 26 Jul 2021 21:03:15 GMT
Etag: "3147526947"
Expires: Mon, 02 Aug 2021 21:03:15 GMT
Last-Modified: Thu, 17 Oct 2019 07:18:26 GMT
Server: ECS (dna/63B7)
Vary: Accept-Encoding
X-Cache: HIT
Content-Length: 1256
<!doctype html>
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<style ty
I ran your code as is now and this is what I got as output, it doesn't seem to be recognizing if the entire stream of data has arrived yet, you probably need an http response parser and track how much of the response you've gotten so far?
It actually stucks there polling. It is actually WAITING for more data... But for some reason all it gets is EAGAIN
Hmm, right you are. Interesting.
Ok this appears to be a setting with SSL_CTX_set_read_ahead and thus epoll/poll don't have any data remaining, its already pulled it off the socket but it hasn't been processed by openssl yet. I've got a working test case that replicates this and I think fixes it, I'll push a PR in a bit. It seems my original test cases the data was so small it never triggered this issue.. so good find!
edit: maybe not, i removed all the code changes but the test case and its still passing x.x
I think I'm just going to drop ssl support, it should realistically be done by the edge load balancer anyways for just about any app.
This is maybe resolved with #100 , since this is such an old issue I'm going to close it, if there are problems again we can re-open a new issue.