panel icon indicating copy to clipboard operation
panel copied to clipboard

Poor performance under DDoS attack

Open OpenSource03 opened this issue 1 year ago • 13 comments

Current Behavior

Hello there!

I'd like to report an issue with the panel where while under even a smaller DDoS attack, the panel is struggling to handle any kind of DDoS attack.

I did have some people reporting that changing cache and session drivers to memcached would result into an improvement, however, when trying to change it to memcached, I get CSRF Token mismatch. However, I did check over internet and it seems like Memcached has the same level of performance as Redis, so I changed it to Redis and even with that, there's gigantic load on CPU while under attack, most of which due to panel repeatedly communicating with database for no real reason at all. Why does it need to query SQL when a random user connects to the website?!

Take a look at resource usage during an attack off of which only ~2MB bypassed: Screenshot 2022-08-04 at 02 31 36

Here is my configuration: Screenshot 2022-08-04 at 02 47 34

Expected Behavior

Pretty simple - don't query database every single time a user connects to the panel.

Steps to Reproduce

Use service like https://anonboot.com/. Launch l7 FREE-FLOOD. Meanwhile, have CloudFlare proxy to mitigate some of the effect of the attack. However, even if as little as 4MB bypass, mostly any CPU would die.

Panel Version

1.10.1

Wings Version

1.7.0

Games and/or Eggs Affected

No response

Docker Image

No response

Error Logs

No response

Is there an existing issue for this?

  • [X] I have searched the existing issues before opening this issue.
  • [X] I have provided all relevant details, including the specific game and Docker images I am using if this issue is related to running a server.
  • [X] I have checked in the Discord server and believe this is a bug with the software, and not a configuration issue with my specific system.

OpenSource03 avatar Aug 04 '22 00:08 OpenSource03

I'm not sure what you expect for us to do here. Seems pretty simple to me, more requests means higher load which leads to higher resource usage.

The panel needs to pull a user's session when they visit the site so it can check if they need to be redirected to the auth pages or from auth pages to the dashboard. Then from there every request might load data relating to servers, an individual server, or hit the underlying node over a websocket (which is unrelated to this).

The information you have provided is absolutely useless in determining if there is an actual issue, which I highly doubt there is. Our documentation cover how to get the software running, not how to tune performance of system, database, and PHP-FPM workers. Provide information like number of requests, how many queries are being ran, actual system specs like CPU and RAM speed, etc.

This might even be considered great performance if you are flooding the panel with tens thousands of requests per second. If you are using Cloudflare you will likely need to manually mitigate the requests unless they are malicious and have a high enough volume, and most likely you aren't using Pro or Business which provide more advanced features to mitigate all types of attacks.

matthewpi avatar Aug 04 '22 01:08 matthewpi

I'm not sure what you expect for us to do here. Seems pretty simple to me, more requests means higher load which leads to higher resource usage.

The panel needs to pull a user's session when they visit the site so it can check if they need to be redirected to the auth pages or from auth pages to the dashboard. Then from there every request might load data relating to servers, an individual server, or hit the underlying node over a websocket (which is unrelated to this).

The information you have provided is absolutely useless in determining if there is an actual issue, which I highly doubt there is. Our documentation cover how to get the software running, not how to tune performance of system, database, and PHP-FPM workers. Provide information like number of requests, how many queries are being ran, actual system specs like CPU and RAM speed, etc.

This might even be considered great performance if you are flooding the panel with tens thousands of requests per second. If you are using Cloudflare you will likely need to manually mitigate the requests unless they are malicious and have a high enough volume, and most likely you aren't using Pro or Business which provide more advanced features to mitigate all types of attacks.

You understand that no matter what I do, no matter how many rules I apply, unless I give everyone captcha challenge, any small amount of traffic bypassing will take the website down. 2MB is enough. It's incredibly small amount of requests.

OpenSource03 avatar Aug 04 '22 01:08 OpenSource03

Also, here's a fact you can't disapprove. Open a browser, like a chrome. Open 10 tabs of your panel (possibly 20 if your PC can handle), select them all by clicking first one, then shift click the last one. As you select them, start refreshing them like crazy, you'll see about 30% CPU usage on a mid-range CPU.

If that's not enough to convince you that there's something seriously wrong with the panel, I don't know what will.

OpenSource03 avatar Aug 04 '22 01:08 OpenSource03

The load a system can handle will vary based on it's hardware.

A 2 thread VPS with 2 gigs of ram won't handle much but a basic load. An 8 thread VPS won't handle 4 times the amount of traffic either. Disk IO and memory bandwidth is also a consideration. All of these add together as to what your service can handle.

Each normal connection loading the page is probably only a few kilobytes in actual traffic. So 2 megabytes can easily be several hundred times that.

I don't see how we are in control of what you are hosting on.

parkervcp avatar Aug 04 '22 01:08 parkervcp

The load a system can handle will vary based on it's hardware.

A 2 thread VPS with 2 gigs of ram won't handle much but a basic load. An 8 thread VPS won't handle 4 times the amount of traffic either. Disk IO and memory bandwidth is also a consideration. All of these add together as to what your service can handle.

Each normal connection loading the page is probably only a few kilobytes in actual traffic. So 2 megabytes can easily be several hundred times that.

I don't see how we are in control of what you are hosting on.

Would you be kind enough, than, to explain how am I able to rise CPU load of 6 core Epyc VPS with 20GB of RAM with just Pterodactyl on it up to 20% while just refreshing those tabs? Does that seem normal to you guys?

OpenSource03 avatar Aug 04 '22 01:08 OpenSource03

Would you be kind enough, than, to explain how am I able to rise CPU load of 6 core Epyc VPS with 20GB of RAM with just Pterodactyl on it up to 20% while just refreshing those tabs? Does that seem normal to you guys?

The speed of those cores is just as, if not more, important as the quantity. You seem to have glossed over the disk and memory callouts there. Those can also drive usage. You should be checking io stats as well.

parkervcp avatar Aug 04 '22 01:08 parkervcp

Can you please share your PHP FPM and MariaDB configurations?

DaneEveritt avatar Aug 04 '22 01:08 DaneEveritt

Basically, defaults for all. Any recommendations where to begin with config changes? What stuff are the most important ones?

;;;;;;;;;;;;;;;;;;;;;
; FPM Configuration ;
;;;;;;;;;;;;;;;;;;;;;

; All relative paths in this configuration file are relative to PHP's install
; prefix (/usr). This prefix can be dynamically changed by using the
; '-p' argument from the command line.

;;;;;;;;;;;;;;;;;;
; Global Options ;
;;;;;;;;;;;;;;;;;;

[global]
; Pid file
; Note: the default prefix is /var
; Default Value: none
; Warning: if you change the value here, you need to modify systemd
; service PIDFile= setting to match the value here.
pid = /run/php/php8.1-fpm.pid

; Error log file
; If it's set to "syslog", log is sent to syslogd instead of being written
; into a local file.
; Note: the default prefix is /var
; Default Value: log/php-fpm.log
error_log = /var/log/php8.1-fpm.log

; syslog_facility is used to specify what type of program is logging the
; message. This lets syslogd specify that messages from different facilities
; will be handled differently.
; See syslog(3) for possible values (ex daemon equiv LOG_DAEMON)
; Default Value: daemon
;syslog.facility = daemon

; syslog_ident is prepended to every message. If you have multiple FPM
; instances running on the same server, you can change the default value
; which must suit common needs.
; Default Value: php-fpm
;syslog.ident = php-fpm

; Log level
; Possible Values: alert, error, warning, notice, debug
; Default Value: notice
;log_level = notice

; Log limit on number of characters in the single line (log entry). If the
; line is over the limit, it is wrapped on multiple lines. The limit is for
; all logged characters including message prefix and suffix if present. However
; the new line character does not count into it as it is present only when
; logging to a file descriptor. It means the new line character is not present
; when logging to syslog.
; Default Value: 1024
;log_limit = 4096

; Log buffering specifies if the log line is buffered which means that the
; line is written in a single write operation. If the value is false, then the
; data is written directly into the file descriptor. It is an experimental
; option that can potentially improve logging performance and memory usage
; for some heavy logging scenarios. This option is ignored if logging to syslog
; as it has to be always buffered.
; Default value: yes
;log_buffering = no

; If this number of child processes exit with SIGSEGV or SIGBUS within the time
; interval set by emergency_restart_interval then FPM will restart. A value
; of '0' means 'Off'.
; Default Value: 0
;emergency_restart_threshold = 0

; Interval of time used by emergency_restart_interval to determine when
; a graceful restart will be initiated.  This can be useful to work around
; accidental corruptions in an accelerator's shared memory.
; Available Units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
;emergency_restart_interval = 0

; Time limit for child processes to wait for a reaction on signals from master.
; Available units: s(econds), m(inutes), h(ours), or d(ays)
; Default Unit: seconds
; Default Value: 0
;process_control_timeout = 0

; The maximum number of processes FPM will fork. This has been designed to control
; the global number of processes when using dynamic PM within a lot of pools.
; Use it with caution.
; Note: A value of 0 indicates no limit
; Default Value: 0
; process.max = 128

; Specify the nice(2) priority to apply to the master process (only if set)
; The value can vary from -19 (highest priority) to 20 (lowest priority)
; Note: - It will only work if the FPM master process is launched as root
;       - The pool process will inherit the master process priority
;         unless specified otherwise
; Default Value: no set
; process.priority = -19

; Send FPM to background. Set to 'no' to keep FPM in foreground for debugging.
; Default Value: yes
;daemonize = yes

; Set open file descriptor rlimit for the master process.
; Default Value: system defined value
;rlimit_files = 1024

; Set max core size rlimit for the master process.
; Possible Values: 'unlimited' or an integer greater or equal to 0
; Default Value: system defined value
;rlimit_core = 0

; Specify the event mechanism FPM will use. The following is available:
; - select     (any POSIX os)
; - poll       (any POSIX os)
; - epoll      (linux >= 2.5.44)
; - kqueue     (FreeBSD >= 4.1, OpenBSD >= 2.9, NetBSD >= 2.0)
; - /dev/poll  (Solaris >= 7)
; - port       (Solaris >= 10)
; Default Value: not set (auto detection)
;events.mechanism = epoll

; When FPM is built with systemd integration, specify the interval,
; in seconds, between health report notification to systemd.
; Set to 0 to disable.
; Available Units: s(econds), m(inutes), h(ours)
; Default Unit: seconds
; Default value: 10
;systemd_interval = 10

;;;;;;;;;;;;;;;;;;;;
; Pool Definitions ;
;;;;;;;;;;;;;;;;;;;;

; Multiple pools of child processes may be started with different listening
; ports and different management options.  The name of the pool will be
; used in logs and stats. There is no limitation on the number of pools which
; FPM can handle. Your system will tell you anyway :)

; Include one or more files. If glob(3) exists, it is used to include a bunch of
; files from a glob(3) pattern. This directive can be used everywhere in the
; file.
; Relative path can also be used. They will be prefixed by:
;  - the global prefix if it's been set (-p argument)
;  - /usr otherwise
include=/etc/php/8.1/fpm/pool.d/*.conf

Here's my MariaDB configuration. I don't know which one were you referring to, but this is what's in /etc/mysql/mariadb.cnf:

# The MariaDB configuration file
#
# The MariaDB/MySQL tools read configuration files in the following order:
# 0. "/etc/mysql/my.cnf" symlinks to this file, reason why all the rest is read.
# 1. "/etc/mysql/mariadb.cnf" (this file) to set global defaults,
# 2. "/etc/mysql/conf.d/*.cnf" to set global options.
# 3. "/etc/mysql/mariadb.conf.d/*.cnf" to set MariaDB-only options.
# 4. "~/.my.cnf" to set user-specific options.
#
# If the same option is defined multiple times, the last one will apply.
#
# One can use all long options that the program supports.
# Run program with --help to get a list of available options and with
# --print-defaults to see which it would actually understand and use.
#
# If you are new to MariaDB, check out https://mariadb.com/kb/en/basic-mariadb-articles/

#
# This group is read both by the client and the server
# use it for options that affect everything
#
[client-server]
# Port or socket location where to connect
# port = 3306
socket = /run/mysqld/mysqld.sock

# Import all .cnf files from configuration directory
!includedir /etc/mysql/conf.d/
!includedir /etc/mysql/mariadb.conf.d/

[mysqld]
bind-address            = 0.0.0.0

OpenSource03 avatar Aug 04 '22 02:08 OpenSource03

not really the solution you are looking for probably, but if you want to protect against someone ddossing the panel (and your just using it yourself) you could maybe whitelist some ips (in your firewall)? so they cant reach the panel

Hermelijn15 avatar Aug 04 '22 15:08 Hermelijn15

not really the solution you are looking for probably, but if you want to protect against someone ddossing the panel (and your just using it yourself) you could maybe whitelist some ips (in your firewall)? so they cant reach the panel

I'm not using itself for myself only, of course. If I did, this would be the least important issue.

In terms of public use there's no WAF system that can filter 100% of attack and even if as little as 50 connections per second bypass it would cause tremendous load on the system. 50 is nothing in case of 10k per second which is a typical low-end l7 attack.

On top of that, I'm again pointing out that I was able to cause load of 20+% of cpu by just refreshing 10-20 tabs. What if there is 10 people doing that?

Also, just to point out that I tried changing PHP fpm configuration, increasing amount of workers, max requests per worker, max servers, children and everything, but none of the settings seemed to work.

I also applied every possible config for MariaDB, tried many high performance ones from GitHub, tried applying and removing various configurations from across the internet but simply there's nothing that can lower the load on the system in any way.

I would highly appreciate if anyone would recommend settings for MariaDB and PHPFPM that would be able to reduce the load on the system.

OpenSource03 avatar Aug 04 '22 16:08 OpenSource03

Be aware that a request to https://panel.example.com/ without authentication that yields a 302 redirect to https://panel.example.com/auth/login is just 650B in size.

Therefore "just 2MB" equals 2MB / 650B = ~3000 Req/s.

If the attacker is aware of that, and does not follow redirects, this makes it very easy to DDoS the app.

I also do not understand why those requests hit the database. If no cookie is present, the panel should forward directly, without reaching out to the database. With PHP being PHP, it might need to load configuration and so on for each request though. That stuff is usually cached, which should make that pretty fast though.

schrej avatar Aug 04 '22 16:08 schrej

Be aware that a request to https://panel.example.com/ without authentication that yields a 302 redirect to https://panel.example.com/auth/login is just 650B in size.

Therefore "just 2MB" equals 2MB / 650B = ~3000 Req/s.

If the attacker is aware of that, and does not follow redirects, this makes it very easy to DDoS the app.

I also do not understand why those requests hit the database. If no cookie is present, the panel should forward directly, without reaching out to the database. With PHP being PHP, it might need to load configuration and so on for each request though. That stuff is usually cached, which should make that pretty fast though.

That's my point as well. I can not understand why those requests are hitting the database so hard. Smaller requests are usually even at rate of 3000/s as you mentioned handled pretty fine with MariaDB, although I'll additionally verify and tell you the exact amount of connections per second soon. I'm more than sure there is possibility to use cookies or Redis or anything else for sessions to make sure the app doesn't need to demolish the database every single time there's a new connection. PHP fpm load I can understand, but bots destroying database every time they connect I simply can not.

OpenSource03 avatar Aug 04 '22 16:08 OpenSource03

We should probably analyse (Laravel has good debugging tools) how many requests are made to the database during an unauthenticated request and see if that can be optimized.

schrej avatar Aug 04 '22 16:08 schrej