reverb icon indicating copy to clipboard operation
reverb copied to clipboard

High CPU usage

Open bbashy opened this issue 1 year ago • 28 comments

Reverb Version

v1.0.0-beta4

Laravel Version

11.0.8

PHP Version

8.2.17

Description

After a day or so, I'm getting 100% CPU usage on the reverb:start process. If I restart reverb, it goes down to 1% CPU.

It's currently doing 160 odd users online in presence channel and that's about it. Had the same problem with this package: https://github.com/beyondcode/laravel-websockets/issues/379

Screenshot 2024-03-23 at 7 49 30 PM

Steps To Reproduce

Have a reverb server running via reverb:start for a day with many presence connections (150+).

bbashy avatar Mar 23 '24 20:03 bbashy

Hey @bbashy, how many messages are you sending across the connections and what sort of server are you running?

Also, are you running Reverb alongside another an application or standalone?

joedixon avatar Mar 26 '24 10:03 joedixon

Hi.

Using it as a presence channel for online user list. Hit 180 users at certain times. It is used on most pages across the site, so people will be leaving/joining on page change. People visit a few pages every few seconds.

Using a Hetzner cloud server AMD EPYC (8c 16GB).

No other applications, this server is just for this site. Was using Soketi (no CPU issues) but it doesn't work on node v20, which is why I switched to Reverb after Laravel upgrade.

Echo.join('online-users')
    .here((users) => {
        this.onlineUsers = users;
    })
    .joining((user) => {
        this.onlineUsers.push(user);
    })
    .leaving((user) => {
        this.onlineUsers = this.onlineUsers.filter(u => u.id !== user.id);
    });

In channels.php it's just a return of user data. Username, country, link to profile.

It's been fine since I deployed last - 24/03/2024 22:00 UTC.

bbashy avatar Mar 26 '24 10:03 bbashy

Thanks for the reply. Going to close this for now, but please mention me in this thread if you see the issue again and I will reopen.

joedixon avatar Mar 26 '24 14:03 joedixon

@joedixon Started happening again. If there's any debugging I can do, let me know.

Screenshot 2024-03-27 at 9 12 33 PM

bbashy avatar Mar 27 '24 21:03 bbashy

@bbashy Would be good to get a better picture of your environment and use case. Would you mind answering the following:

  1. Does the CPU continually increase over time or does it suddenly spike to 100%?
  2. If you leave it, does the CPU eventually drop back down?
  3. Are you using the Pulse integration?
  4. Do you have debug mode enabled?
  5. Are you broadcasting messages (if so, roughly how many) or just using for presence indication?
  6. Are you able to use tinker to run the code samplese below to get the current number of connections and channels:
// Connections
app(\Illuminate\Broadcasting\BroadcastManager::class)
  ->getPusher()
  ->get('/connections');

// Channels
app(\Illuminate\Broadcasting\BroadcastManager::class)
  ->getPusher()
  ->get('/channels', ['info' => 'subscription_count']);

joedixon avatar Mar 28 '24 08:03 joedixon

  1. I've only noticed it being 1% or 99-100%.
  2. It's still at that since my last reply (16 hours).
  3. No but I plan to once out of beta. Shall I install?
  4. No. Logs just show "INFO Stopping server on 0.0.0.0:2096. INFO Starting secure server on 0.0.0.0:2096 (example.com)." etc.
  5. We have a 'chat room' but it's hardly used.
  6. See below.
Psy Shell v0.12.2 (PHP 8.2.17 — cli) by Justin Hileman
> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/connections');
= {#5634
    +"connections": 117,
  }

> app(\Illuminate\Broadcasting\BroadcastManager::class)->getPusher()->get('/channels', ['info' => 'subscription_count']);
= {#5606
    +"channels": {#5612
      +"servers-booked": {#5610
        +"subscription_count": 14,
      },
      +"stats-counter": {#5605
        +"subscription_count": 14,
      },
      +"presence-public-chat": {#5607},
    },
  }

bbashy avatar Mar 28 '24 14:03 bbashy

No need to install Pulse, just wanted to rule it out as a factor.

Would be interesting to run with debug mode enabled for a little while (which in itself will increase CPU) to see if anything untoward is happening when the CPU spikes.

joedixon avatar Mar 28 '24 15:03 joedixon

Running with debug now and I'm running a script to gather the cpu % every 2 seconds.

Will report here with data.

bbashy avatar Mar 31 '24 22:03 bbashy

Gone up to 100% again, here's the cpu usage, pretty much instant, not gradual. Running in debug for 24 hours.

2024-04-01 20:33:46 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:33:48 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:50 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:52 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:54 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:33:56 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:33:58 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:00 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:02 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:04 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:06 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:08 :: reverb:start --debug [2282066] CPU Usage: 1%
2024-04-01 20:34:10 :: reverb:start --debug [2282066] CPU Usage: 4%
2024-04-01 20:34:12 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:14 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:16 :: reverb:start --debug [2282066] CPU Usage: 2%
2024-04-01 20:34:18 :: reverb:start --debug [2282066] CPU Usage: 3%
2024-04-01 20:34:20 :: reverb:start --debug [2282066] CPU Usage: 52%
2024-04-01 20:34:22 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:24 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:26 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:28 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:30 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:32 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:34 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:36 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:38 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:40 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:42 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:44 :: reverb:start --debug [2282066] CPU Usage: 100%
2024-04-01 20:34:46 :: reverb:start --debug [2282066] CPU Usage: 99%
2024-04-01 20:34:48 :: reverb:start --debug [2282066] CPU Usage: 100%

The reverb logs don't show much apart from maybe this? This is quite regular in the whole log even when it was low usage.

Message Handled ........................................ 285448861.320304079  
Connection Closed ...................................... 880659721.152382869  
Pruning Stale Connections ..................................................  
Pinging Inactive Connections ...............................................  
Connection Pinged ...................................... 373144331.352179112  
Connection Pinged ...................................... 770897394.824165909  
Connection Pinged ...................................... 128326687.479625382  
Connection Pinged ...................................... 238258499.280963052  
Connection Pinged ...................................... 711357617.733355114  
Connection Pinged ...................................... 913581555.461208734  
Connection Pinged ...................................... 750335547.483737098  
Connection Pinged ...................................... 824799629.748502904  
Connection Pinged ...................................... 373757589.569113713  
Connection Pinged ....................................... 88650420.659599424  
Connection Pinged ...................................... 875588527.671792201  
Connection Pinged ...................................... 645841583.993685622  
Connection Pinged ....................................... 69443263.770122343  
Connection Pinged ....................................... 969366386.39638130  
Connection Pinged ....................................... 97541188.835314754  
Connection Pinged ...................................... 336550205.253472927  
Connection Pinged ...................................... 837108525.856830003  
Connection Pinged ...................................... 843470688.187968972  
Connection Pinged ...................................... 903186882.544248154  
Connection Pinged ....................................... 70865328.772779885  

then after 170~ lines of that, it does loads of;

Message Received ....................................... 913581555.461208734  

1    {
2        "event": "pusher:ping",
3        "data": []
4    }

Message Handled ........................................ 913581555.461208734  
Message Received ......................................... 746315846.6165177  

1    {
2        "event": "pusher:ping",
3        "data": []
4    }

bbashy avatar Apr 01 '24 21:04 bbashy

@bbashy the logs look normal, but I'm interested to see if the issue is related to the pinging of connections which Reverb defaults to running every 60 seconds. Can you try increasing the value of REVERB_APP_PING_INTERVAL to see whether that makes any difference. Perhaps starting at 1 hour (3600) and increasing from there.

joedixon avatar Apr 04 '24 11:04 joedixon

I've set REVERB_APP_PING_INTERVAL=3600 and I've restarted reverb. Back to 0-1% CPU and will run the monitoring for the process.

Must be something to do with the amount of pings along with the connections subscribe/unsubscribe.

I use it as an online list and it's included on most pages, so page change will be a disconnect/connect.

bbashy avatar Apr 04 '24 12:04 bbashy

Right, but I would say that's pretty standard usage. Perhaps the difference is the online list as that will cause events to be broadcast across all connected clients on every connection / disconnection. Pusher actually limit this functionality to 100 connections only.

joedixon avatar Apr 04 '24 13:04 joedixon

Yep, I saw that and Soketi limit it too. I increased the limit on Soketi to 500 and the cpu usage was still below 10% even after a month of uptime.

The online list is only on the homepage though, other pages only have a counter, but I guess it still updates to all clients.

bbashy avatar Apr 04 '24 13:04 bbashy

Gone up again. Shall I try a higher ping interval? It's weird it goes straight up 0-100% in 4 seconds.

2024-04-04 19:28:00 :: reverb:start --debug [237336] CPU Usage: 2%
2024-04-04 19:28:02 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:04 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:06 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:08 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:10 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:12 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:14 :: reverb:start --debug [237336] CPU Usage: 1%
2024-04-04 19:28:16 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:18 :: reverb:start --debug [237336] CPU Usage: 0%
2024-04-04 19:28:20 :: reverb:start --debug [237336] CPU Usage: 87%
2024-04-04 19:28:22 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:24 :: reverb:start --debug [237336] CPU Usage: 99%
2024-04-04 19:28:26 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:28 :: reverb:start --debug [237336] CPU Usage: 100%
2024-04-04 19:28:30 :: reverb:start --debug [237336] CPU Usage: 100%

If it's not something others have had and we can't resolve, I'll look into changing the online user list to something ws message based.

bbashy avatar Apr 04 '24 19:04 bbashy

It would be good to try increasing the ping interval to see if it has any impact on when the CPU spikes. Agree it's strange that it ramps up so quickly.

Are you in a position to be able to temporarily disable the online user list so we can categorically determine that is actually the issue?

joedixon avatar Apr 05 '24 14:04 joedixon

@bbashy Sudden CPU spikes are definitely not to be expected and if you've encountered the exact same error in both the original Laravel Websockets package and now Reverb, this might in fact indicate a problem in the underlying Ratchet/ReactPHP components. Similar reports have been posted in the past and we have some ideas what could cause this, but we've never been able to reproduce this locally, see https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1098798274 / https://github.com/ratchetphp/Ratchet/issues/939#issuecomment-1101270289. If this matches your issue and you can reproduce this locally, please reach out and I'm happy to take a look at this.

clue avatar Apr 09 '24 09:04 clue

@clue Looks very similar and relates to my use-case of high connect/disconnect events. In theory, all we'd need to do is replicate a lot of users (100+) connecting/disconnecting. It started looking like it was because of that after I saw the time it shot up. 19:30-20:30 is when my site has the most users on it, and they'll be doing a lot of page changes (disconnect/connect).

bbashy avatar Apr 09 '24 10:04 bbashy

@bbashy I faced a similar problem. Supervisor is raised under the docker container, the reverb process is started there. This process was loaded at 100%, 500+ connections at the same time.

The solution turned out to be to add an extension as indicated in the documentation Event Cycle

Dockerfile

FROM php:8.3.3-fpm-bullseye
#....
#....
RUN apt-get install openssl
RUN docker-php-ext-install sockets
RUN pecl channel-update pecl.php.net
RUN apt-get update && apt-get install -y libevent-dev

RUN printf "\n\n\n\n\nno\nyes\n\n" | pecl install event
RUN docker-php-ext-enable event

Also installed outside the host system ulimit -n 32766

in the supervisor settings supervisor.conf minfds=10000

[supervisord]
nodaemon=true
chmod=0770
logfile=/var/log/supervisor/supervisord.log
pidfile=/var/run/supervisord.pid
minfds=10000

[program:laravel-websocket]
numprocs=8
process_name=%(program_name)s_%(process_num)02d
chmod=0770
command=php /var/www/html/artisan reverb:start
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/www/html/storage/logs/reverb.log

finally

image

Alex0Fury avatar May 21 '24 13:05 Alex0Fury

Might have fixed an issue for you, but I believe mine was slightly different.

I have no problems when running it under Soketi on the same machine/ulimit.

bbashy avatar May 21 '24 13:05 bbashy

finally

Возможно, я решил для вас проблему, но я думаю, что у меня было немного иначе.

I believe that our answers can help other developers to cope with such a problem. And the owners of the repository may take into account some of our practices

Alex0Fury avatar May 21 '24 13:05 Alex0Fury

This is still on my radar, I just haven't had the chance to take a deep dive on it just yet.

joedixon avatar May 22 '24 07:05 joedixon

Это все еще в моем поле зрения, просто у меня еще не было возможности глубоко погрузиться в него.

Yesterday I experimented with installing the package pecl install ev, which similarly saves the situation. In a production environment with 600+ connections, the load on the process has decreased. There is one running artisan reverb hanging in the artisan reverb:start

it is necessary to specify in php.ini extension=ev.so

image

Alex0Fury avatar May 22 '24 08:05 Alex0Fury

image

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

rawezhcode avatar May 26 '24 17:05 rawezhcode

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

Did you install the ev extension?

sneycampos avatar Aug 05 '24 11:08 sneycampos

image

When the number of connections exceeds > 1000, the CPU 100 % increases at a strange rate and causes it to stop and restart. On an VPS ( 8core, 16GB RAM ) there is this problem

In my case, the problem was solved after installing the ev extension. Currently, more than 2,500 users connect without problems

rawezhcode avatar Aug 05 '24 17:08 rawezhcode

I faced the same problem. I installed it on Forge Laravel using load balance 3 server All servers have 100% cpu usage. Is there any solution related to forge?

samehdoush avatar Sep 11 '24 06:09 samehdoush

Reverb will automatically switch to an ext-uv powered loop when available. Does ext-event still support it?

season886 avatar Sep 20 '24 01:09 season886

Anyone here have the same issue with uv installed? I'm trying to use the /up endpoint for health checks in k8s and I keep getting the pod failing the health checks and restarting. After some debugging I have determined that indeed I have seemingly sporadic requests (to /up) which take up to 60s.

I'm wondering if I should try event vs uv. I'm running in HA mode and also wondering if it could be connected to some operation connecting to redis in the /up endpoint. I currently have 2 replicas and it's pretty common that both exhibit the behavior at roughly the same time.

# /vendor/react/event-loop/src/Factory.php
    /**
     * @internal
     * @return LoopInterface
     */
    private static function construct()
    {
        // @codeCoverageIgnoreStart
        if (\function_exists('uv_loop_new')) {
            // only use ext-uv on PHP 7
            return new ExtUvLoop();
        }

        if (\class_exists('libev\EventLoop', false)) {
            return new ExtLibevLoop();
        }

        if (\class_exists('EvLoop', false)) {
            return new ExtEvLoop();
        }

        if (\class_exists('EventBase', false)) {
            return new ExtEventLoop();
        }

        if (\function_exists('event_base_new') && \PHP_MAJOR_VERSION === 5) {
            // only use ext-libevent on PHP 5 for now
            return new ExtLibeventLoop();
        }

        return new StreamSelectLoop();
        // @codeCoverageIgnoreEnd
    }

travisghansen avatar Mar 13 '25 22:03 travisghansen

Hello, is this issue fixed, i have plans to use laravel reverb for my app, is reverb really make that cpu usage to go that high??

muhammedjafer avatar Dec 13 '25 17:12 muhammedjafer