unit icon indicating copy to clipboard operation
unit copied to clipboard

What does "failed to send app message" mean?

Open travisbell opened this issue 4 years ago • 2 comments

Hey everyone,

We had a big spike of traffic that overloaded the capacity of one of our services backed by Unit and after this event, some of (not all) the instances starting returning this message:

2021/07/21 15:02:29 [alert] 39#80 *262671 stream #2155672, app 'app': failed to send app message
2021/07/21 15:02:29 [alert] 35#75 *260517 stream #2155759, app 'app': failed to send app message
2021/07/21 15:02:29 [alert] 35#75 *260527 stream #2155758, app 'app': failed to send app message
2021/07/21 15:02:29 [alert] 38#77 *252973 stream #2155773, app 'app': failed to send app message
2021/07/21 15:02:28 [alert] 38#77 *252966 stream #2155772, app 'app': failed to send app message
2021/07/21 15:02:28 [alert] 39#80 *262692 stream #2155671, app 'app': failed to send app message

What does this error mean? Clearly Unit was in a bad state, as Unit was returning nothing but 500 errors for all of these requests. I can't seem to get a hold of any more meaningful log messages, though, so I don't know what happened to Unit. Oh ya, this is a Ruby app:

{
  "listeners": {
    "*:8080": {
      "pass": "routes/app"
    }
  },
  "routes": {
    "app": [
      {
        "match": {
          "uri": "/app*"
        },
        "action": {
          "pass": "applications/app"
        }
      }
    ]
  },
  "applications": {
    "app": {
      "type": "ruby",
      "processes": 4,
      "threads": 2,
      "script": "config.ru",
      "working_directory": "/path/to/app/",
      "limits": {
        "timeout": 55
      }
    }
  }
}

Do you guys have an idea as to what's going on?

travisbell avatar Jul 21 '21 15:07 travisbell

Hello.

Sorry for delayed answer.

The message failed to send app message appears in log when application message queue is full of request messages (130K) which is not yet started to handle by application process(es). This may be caused by request rate which exceeds the application processes capability.

In order to avoid such message in future, it makes sense to configure more processes statically or dynamically (link).

Does Unit recovers to normal state after traffic spike goes down?

mar0x avatar Dec 02 '21 16:12 mar0x

The message failed to send app message appears in log when application message queue is full of request messages (130K) which is not yet started to handle by application process(es).

Aha, ok, perfect. This is the information I was looking for. Yes, as I mentioned in this case we had a huge spike of traffic (the internet will internet 😜) and while we autoscale our services, Unit became overwhelmed before we could.

Does Unit recovers to normal state after traffic spike goes down?

So this is the more interesting part, no, it appeared as though once Unit got into this state, it became completely unresponsive and I had to restart it in order for it to resume taking requests. Basically the node just stopped working and never came back.

travisbell avatar Dec 02 '21 19:12 travisbell