zbxdb icon indicating copy to clipboard operation
zbxdb copied to clipboard

Zbxdb plugin leading to performance issues in RDS

Open yatharth27 opened this issue 1 year ago • 7 comments

Certain SQL queries in the zbxdb plugin are consuming upto 80-90% memory inside the RDS as reported by my team. Is there a way we can figure out why these certain queries take up such huge chunk of memory n RDS?

yatharth27 avatar Sep 28 '23 08:09 yatharth27

Normal performance analysis should be done. First find the query and check it with a regular client. Make sure that the object statistics are up to date.

ikzelf avatar Sep 28 '23 08:09 ikzelf

So we have identififed the queries that are taking up 80-90% of the memory. These are the ones below :

SELECT concat('p_ts[',table_schema,',,filesize]'), sum( data_length + index_length + data_free) FROM information_schema.TABLES GROUP BY table_schema union all SELECT concat('p_ts[',table_schema,',,usedbytes]'), sum( data_length + index_length) FROM information_schema.TABLES GROUP BY table_schema

And

SELECT performance_schema . events_waits_summary_global_by_event_name . EVENT_NAME AS events , performance_schema . events_waits_summary_global_by_event_name . COUNT_STAR AS total , performance_schema . events_waits_summary_global_by_event_name . SUM_TIMER_WAIT AS total_latency , performance_schema . events_waits_summary_global_by_event_name . AVG_TIMER_WAIT AS avg_latency , performance_schema . events_waits_summary_global_by_event_name . MAX_TIMER_WAIT AS `

So, can we try to modify these queries and can you confirm that the plugin is going to work the in the same manner it previously did?

yatharth27 avatar Oct 09 '23 10:10 yatharth27

As long as you make sure that the queries report their results in the same way as before you can use any query that returns the metrics you want. As far as I can see this is for mysql? You can check by running the SQL in a regular client. I found https://bugs.mysql.com/bug.php?id=97935 Does the query allocate all the memory immediately or does it grow after repeating the query?

zbxdb normally tries to use and re-use a single connection since connecting and disconnecting for many databases is a very expensive operation. If a memory leak is playing a role, that leak should be fixed but before that it should be found. The zbxdb.py code itself does not have a leak, after hundreds of thousands iterations on an oracle of postgres database there is no growth of the memory allocation. The own memory size is also monitored by zbxdb and it is in the template so you can check that too.

ikzelf avatar Oct 09 '23 11:10 ikzelf

how to find backend mysql query for the metric name in zabbix_out folder for specific mysql Arora DB ?

yatharth27 avatar Oct 25 '23 06:10 yatharth27

check the etc/zbxdb_checks/mysql/ folder for the db version you need. The queries are in there.

ikzelf avatar Oct 25 '23 07:10 ikzelf

So, after having a word with the team, the below query is running every 1 hour and that is leading to high memory usage. So, can we schedule this query to run once in 5 days or on the weekends (Staurday/Sunday)?

SELECT concat('p_ts[',table_schema,',,filesize]'), sum( data_length + index_length + data_free) FROM information_schema.TABLES GROUP BY table_schema union all SELECT concat('p_ts[',table_schema,',,usedbytes]'), sum( data_length + index_length) FROM information_schema.TABLES GROUP BY table_schema

yatharth27 avatar Nov 02 '23 09:11 yatharth27

I am not very familiar with mysql but I think a better solution is to check the optimiser statistics for the information_schema. You can set the interval for the query to a longer time. To do that you add a new chapter to the cfg file for your database make and version where you put that query in. Just read the file and adjust to your likings. Currently it is in: [checks_60m] minutes: 60 p_ts: SELECT concat('p_ts[',table_schema,',,filesize]'), sum( data_length + index_length + data_free) FROM information_schema.TABLES GROUP BY table_schema union all SELECT concat('p_ts[',table_schema,',,usedbytes]'), sum( data_length + index_length) FROM information_schema.TABLES GROUP BY table_schema

nothing prevents you from adding/changing a longer interval. The chapter name is just a name, the minutes: XXXX directive is the one that controls the repeat interval.

ikzelf avatar Nov 02 '23 10:11 ikzelf