rtpmidid sometimes goes into a 100% CPU loop when an fd is closed

Aug 20 '21 14:08 folkertvanheusden

Looks a bit overkill as any exception on the callback of the poller will cause it to be removed. This should be only for some exceptions about the fd being invalid or similar. Si maybe a new exception of "invalid_fd" or similar.

Aug 20 '21 15:08 davidmoreno

What about c5d6c2927ff76595e79d837d0fce7372acab873d?

There is something strange with this code from poller.cpp:

    } catch (const invalid_fd &e) {
      ERROR("Caught exception at poller: {}, forgetting fd {}.", e.what(), fd);
      remove_fd(fd);
    }

and rtpclient.cpp:

void rtpclient::data_ready(rtppeer::port_e port) {
  uint8_t raw[1500];
  struct sockaddr_in6 cliaddr;
  unsigned int len = sizeof(cliaddr);
  auto socket = port == rtppeer::CONTROL_PORT ? control_socket : midi_socket;
  auto n = recvfrom(socket, raw, 1500, MSG_DONTWAIT,
                    (struct sockaddr *)&cliaddr, &len);
  // DEBUG("Got some data from control: {}", n);
  if (n < 0) {
    if (errno == EBADF)
      throw invalid_fd("Filedescriptor {} for {}/{}-port is invalid",
                      socket, peer.remote_name,
                      port == rtppeer::CONTROL_PORT ? "control" : "MIDI");

Aug 20 19:16:01 lappiemctopface rtpmidid[33230]: [poller.cpp:204] Caught exception at poller: Filedescriptor 18 for DOREMiDi-MIDI/MIDI-port is invalid, forgetting fd 10.

I don't understand why it is 18 in data_ready and 10 in the poller.

Aug 20 '21 17:08 folkertvanheusden

I think you might be getting a fd problem called from an event of another fd.

The invalid_fd exception should say which fd had the exception, and then remove that one from the poller.

Aug 20 '21 17:08 davidmoreno

I think you might be getting a fd problem called from an event of another fd.

But how?

The invalid_fd exception should say which fd had the exception, and then remove that one from the poller.

I just realised that in case of a pair of sockets (control/data) closing one would leave the other dangling.

Aug 20 '21 18:08 folkertvanheusden

The how, storing the fd at the exception and using it later.

About dangling.. if one fails the other should fail too sooner or later, isnt it? If not then maybe we should add some callback to be called when a fd is removed... or is this getting too complicated?

Aug 20 '21 18:08 davidmoreno

About the other FD: true.

"I think you might be getting a fd problem called from an event of another fd."

My question was: how can this happen? It comes from the same thread.

Op vr 20 aug. 2021 20:08 schreef David Moreno Montero < @.***>:

The how, storing the fd at the exception and using it later.

About dangling.. if one fails the other should fail too sooner or later, isnt it? If not then maybe we should add some callback to be called when a fd is removed... or is this getting too complicated?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/davidmoreno/rtpmidid/pull/73#issuecomment-902867087, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUN5IW75U3HQWDJ3UBO7VJ3T52KYRANCNFSM5CQN4J4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 20 '21 18:08 folkertvanheusden

I can only think that becaus eof whatever happenend on the first fd affected to do something on the second. For example an ALSA event may cause to send some network data.

Do you know https://rr-project.org/ ? I did try it years ago, and dont know if it will help, but I think it can record the full history of one crash, and then we can inspect why it got there.

Can you give it a try?

Aug 21 '21 10:08 davidmoreno

I had never heard of rr. Thanks for pointing me at it! I'll give it a try.

On Sat, Aug 21, 2021 at 12:14 PM David Moreno Montero < @.***> wrote:

I can only think that becaus eof whatever happenend on the first fd affected to do something on the second. For example an ALSA event may cause to send some network data.

Do you know https://rr-project.org/ ? I did try it years ago, and dont know if it will help, but I think it can record the full history of one crash, and then we can inspect why it got there.

Can you give it a try?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/davidmoreno/rtpmidid/pull/73#issuecomment-903094546, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUN5IW5F5QW6LNCLBIHSFGLT554BDANCNFSM5CQN4J4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 21 '21 11:08 folkertvanheusden

Oh that won't work: https://github.com/rr-debugger/rr/issues/2873 They're disabling alsa-access in rr.

On Sat, Aug 21, 2021 at 1:48 PM Folkert van Heusden @.***> wrote:

I had never heard of rr. Thanks for pointing me at it! I'll give it a try.

On Sat, Aug 21, 2021 at 12:14 PM David Moreno Montero < @.***> wrote:

I can only think that becaus eof whatever happenend on the first fd affected to do something on the second. For example an ALSA event may cause to send some network data.

Do you know https://rr-project.org/ ? I did try it years ago, and dont know if it will help, but I think it can record the full history of one crash, and then we can inspect why it got there.

Can you give it a try?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/davidmoreno/rtpmidid/pull/73#issuecomment-903094546, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUN5IW5F5QW6LNCLBIHSFGLT554BDANCNFSM5CQN4J4A . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&utm_campaign=notification-email .

Aug 21 '21 12:08 folkertvanheusden

Code would not apply on current master.

Please open a issue or another pull request.

Thanks a lot anyway for the effort for creating this pull request.

Jan 28 '24 21:01 davidmoreno

rtpmidid rtpmidid copied to clipboard

sometimes goes into a 100% CPU loop when an fd is closed

rtpmidid
rtpmidid copied to clipboard