honeybadger-php icon indicating copy to clipboard operation
honeybadger-php copied to clipboard

bug: slower response times when events api are enabled

Open subzero10 opened this issue 1 year ago • 8 comments

A user reported they noticed 300ms slower response times when they enabled Insights.

The user suggested to send the notice after the response is returned.

subzero10 avatar Oct 24 '24 06:10 subzero10

Update: Unless, I'm missing something here, I don't see an issue with our original approach to use a shutdown hook:

  • Events are queued
  • The shutdown function is called after the response is sent.

There are two cases where the events will be sent before the response is sent to the client:

  1. The number of events in the queue exceeds the threshold (default is 50).
  2. Some time has passed since the last time events were sent (defaults to 2 seconds). This time is set as soon as the class is instantiated, which means that if a request is queuing events for more than 2 seconds and has not finished yet, a request will be made to the Events API even if the response is not yet sent to the client.

If the user is consistently seeing significant slower response times when enabling Insights, it is possible that they are queueing a lot of events before responding to the client, hence triggering the 1st case from above. In this case, they can simply increase the threshold.

Sending events using a terminable middleware: Slack 2024-11-15 09 57 47

Sending events to insights using the shutdown hook: Arc 2024-11-15 09 57 11

subzero10 avatar Nov 15 '24 08:11 subzero10

@subzero10 so that I'm clear:

In cases 1 and 2 above, we're blocking the response while the events are being sent? Or are we sending the events asynchronously, i.e. in a separate thread? We should never be sending events synchronously imo. Am I missing something?

joshuap avatar Nov 15 '24 22:11 joshuap

In cases 1 and 2 above, we're blocking the response while the events are being sent?

Yes, that’s correct. PHP applications do not use multi-threading. Each request is handled by a separate process, typically managed by tools like PHP-FPM.

To achieve non-blocking processing, PHP applications leverage a worker process (e.g., a job queue service) to handle tasks asynchronously. However, not all PHP applications use such queues, and identifying their presence can be non-trivial. Though, for Laravel apps specifically, detecting a queue setup should be straightforward.

We should never be sending events synchronously imo.

Agreed, in most cases involving HTTP requests, synchronous event sending is suboptimal due to the potential for delayed responses. However, synchronous processing is sometimes acceptable, such as when running PHP outside the context of an HTTP request. In those scenarios (e.g., CLI processes or worker services), response time is less critical, as each job runs independently and on its own process.


Our current approach provides a general-purpose solution for PHP and Laravel applications:

  1. PHP code executed in the context of a web server:
  • Events are queued in memory and sent in batches before the process shuts down (after the response is sent).
  • For cases where a request generates many events over an extended period, sending batches synchronously during request processing is an option. While this may delay the response slightly (a few hundred milliseconds), it’s often acceptable when response time is not a strict requirement.
  1. PHP code executed in CLI (e.g., artisan) or worker services:
  • In these cases, processes are long-running and independent. Synchronous or asynchronous event sending has minimal impact on the overall operation.

Trade-off: The current approach may become inefficient when a large number of events are generated in a short time, as sending them synchronously could introduce bottlenecks.

Potential Improvements: To better address varying use cases, we can adapt our implementation based on the application context:

  1. HTTP Requests:
  • Queue events in memory and send them in batches after the response is sent.
  • For Laravel apps with a queue system:
    • Dispatch a HoneybadgerSendEventsJob to handle event sending asynchronously.
  1. Non-HTTP Requests:
  • Send events in batches as the in-memory queue fills.
  • For Laravel apps with a queue system:
    • Dispatch a HoneybadgerSendEventsJob for asynchronous processing.

If you agree with the above, I can create issues to start working on them. Let me know if there’s anything else you’d like clarified or adjusted.

subzero10 avatar Nov 16 '24 08:11 subzero10

To better address varying use cases, we can adapt our implementation based on the application context:

  1. HTTP Requests:
  • Queue events in memory and send them in batches after the response is sent.

  • For Laravel apps with a queue system:

    • Dispatch a HoneybadgerSendEventsJob to handle event sending asynchronously.

The big change here from how it behaves currently (and the solution to this users problem) would be to use the queue system, right? It already queues the events in memeory and sends them in batches after the response is sent (unless there the batch fills up)?

joshuap avatar Nov 21 '24 01:11 joshuap

The big change here from how it behaves currently would be to use the queue system, right?

Correct. Basically, I'm thinking that we would implement more than one mechanism to send events and choose the best one based on configuration. I don't know yet if this can be done entirely automatically without any further configuration (current research shows that we can't) because even though Laravel's configuration may point to a queue, the queue process may not be running (i.e. this is a local/dev/staging/testing environment). This may cause confusion. We will have to make sure this is well documented.

(and the solution to this users problem)

We don't know for sure that this user's problem was the fact that they were generating too many events in a short amount of time. If indeed this was the problem, they could configure the batch threshold to be higher (the default is 50) and their problem would go away even with the current implementation. 🤔 On a similar note, should we raise the default number to something bigger (i.e. 100)?

It already queues the events in memeory and sends them in batches after the response is sent (unless there the batch fills up)?

Yes, correct. This is the job of the BulkEventDispatcher.php:

  • It pushes events to an array and
  • checks if the array is over the threshold (defaults to 50) or the last dispatch time is greater than the interval (defaults to 2 seconds).
  • When the process is ready to shutdown (after HTTP response has been sent), the flushEvents() is called to send any remaining events.

subzero10 avatar Nov 21 '24 07:11 subzero10

I'm fine with bumping the default threshold to 100 and recommending that the impacted customer try adjusting that setting.

Instead of trying to auto-detect (and use) a queueing system, how about we make it an option in the config to enable a queueing system (from a list of supported systems) for sending payloads? Then the customer can choose to use/enable the plugin. This would be analogous to a Ruby user opting to use Sidekiq jobs for sending events.

We could also point customers to using Vector as a proxy: configure the PHP app to deliver payloads to a locally-running Vector instance that is configured to send events on to our API. This would avoid potential network delays, which might have contributed to the reported slowdown.

stympy avatar Nov 21 '24 18:11 stympy

I'm fine with bumping the default threshold to 100 and recommending that the impacted customer try adjusting that setting.

👍 OK I will submit a PR for this.

Instead of trying to auto-detect (and use) a queueing system, how about we make it an option in the config to enable a queueing system (from a list of supported systems) for sending payloads? Then the customer can choose to use/enable the plugin. This would be analogous to a Ruby user opting to use Sidekiq jobs for sending events.

After some more thinking, I'm leaning towards this approach as well. Auto-detection will not always work - in that case we will have to fall back to the default implementation. But this will introduce another debugging step to find out which implementation was used. It will probably create more problems along the way, such as queuing Honeybadger events to the wrong queue. We could create a new implementation and allow people to enable it instead of the default approach and allow configuring it as well (i.e. queue connection and queue name). @joshuap, if you are OK with this, we can create an issue and start working on it.

We could also point customers to using Vector as a proxy: configure the PHP app to deliver payloads to a locally-running Vector instance that is configured to send events on to our API. This would avoid potential network delays, which might have contributed to the reported slowdown.

Correct, that's also an option. I'm thinking that it would be a solution for people that have already witnessed the value of Honeybadger Insights (or are familiar with concepts such as wide events) and would go the extra effort of setting up Vector in their infrastructure.

subzero10 avatar Nov 22 '24 07:11 subzero10

After some more thinking, I'm leaning towards this approach as well. Auto-detection will not always work - in that case we will have to fall back to the default implementation. But this will introduce another debugging step to find out which implementation was used. It will probably create more problems along the way, such as queuing Honeybadger events to the wrong queue. We could create a new implementation and allow people to enable it instead of the default approach and allow configuring it as well (i.e. queue connection and queue name). @joshuap, if you are OK with this, we can create an issue and start working on it.

I'm good with this. I agree that it should be user-configured instead of automatically detected.

joshuap avatar Dec 13 '24 20:12 joshuap

Note for future reference:

  • For Laravel apps, we moved away from the shutdown listener in favor of Terminable Middleware: https://github.com/honeybadger-io/honeybadger-laravel/pull/161

subzero10 avatar Jul 30 '25 16:07 subzero10

Closing this since the code has changed/improved since the ticket was originally opened. We have since changed the default thresholds for the events dispatcher + moved flushEvents to terminable middleware for Laravel projects.

subzero10 avatar Aug 26 '25 11:08 subzero10