steve
steve copied to clipboard
Opening OCPP tags page takes forever
Checklist
- [x] I checked other issues already, but found no answer/solution
- [x] I checked the documentation and wiki, but found no answer/solution
- [x] I am running the latest version and the issue still occurs
- [x] I am sure that this issue is about SteVe (and not about the charging station software or something unrelated to SteVe)
Specifications
SteVe Version : at least with 3.5.0 and 3.6.0 (also on 3.4.x, but I can't remember what minor version it was)
Operating system : happens on current Arch Linux, Ubuntu 22.04 LTS and whatever the default Docker compose uses in its container
JDK : tested with 14, 18, 20 and whatever the default Docker compose uses in its container
Database : MariaDB 10.4
Expected Behavior
Opening the OCPP tags page should not be slow.
Actual Behavior
Opening the page for OCPP tags takes a very long time (up to multiple minutes at some point) and database runs at 100% while waiting. Problem gets worse over time. It works just fine initially, even with hundreds of OCPP tags.
Steps to Reproduce the Problem
I am guessing here, but it happened on three different installations, gradually getting worse over the system's lifetime.
- Have around 50 to 100 OCPP tags
- Have an increasing number of transactions in the system (100+)
Additional context
I suspect/guess the issue is an increasing number of transactions which results in slow loading of the OCPP tags page because of complex queries trying to figure out if a tag is in a transaction or not.
You don't share enough about your infrastructure. What are the caracteristics of the server which hosts steve? Same question about mysql. How do you monitor process load on the servers?
The current 3.6.0 installation is running on a Ubuntu 22.04 B4ms Azure VM using docker compose with everything at default (except passwords and usernames of course), but the same issue was present on two different machines with Ubuntu 22.04 and Arch Linux with the application started directly from jar file with a manually set up MariaDB. SteVe and database are on the same instance in all of those cases.
Process load has been simply observed by using the top command. While waiting for the page to load MariaDB hogs up one core (100%) until the page has loaded. Again, this is a gradually emerging problem. A relatively fresh install has no problems whatsoever. The load and time to load seems to grow with the number of transactions that have been registered by the system.
Current metrics within the system: around 100 OCPP tags and currently at around 1000 completed transactions. It doesn't seem to matter if there are any transactions active or not. Page load currently takes 17 seconds according to browser's dev tools. MariaDB at 100% CPU utilization for those 17 seconds, then back to normal (basically 0% with some "spikes" to 2% here and there). Page load increases over time, hence I suspect the relation to the transactions. All other pages, including transactions, load normal as in almost instant. On an older installation that had almost ten times the amount of transactions the page load for OCPP tags took around two minutes.
I had similar performance issues on my system. I changed the sql-code of the ocpp_tag_activity view. I removed the transaction view call and replaced it with a join of transaction_start/transaction_stop. It's not exactly the same as before, so there may be some side effects I'm not aware of! If you want to try it on a test system, here's the code: https://github.com/fnkbsi/steve/blob/OcppTagActivity_noTransactionView/src/main/resources/db/migration/V1_0_5__update.sql
Current metrics within the system: around 100 OCPP tags and currently at around 1000 completed transactions. It doesn't seem to matter if there are any transactions active or not. Page load currently takes 17 seconds according to browser's dev tools.
At roughly 1800 transactions the page load is now up to 65 seconds already.
Can you provided the slow-query.log of mariadb (you may have to activated the slow-query logging first).
This is my slow-query.log:
Reading mysql slow query log from /var/log/mysql-slow.log
Count: 1 Time=329.89s (329s) Lock=0.00s (0s) Rows_sent=173.0 (173), Rows_examined=232674298.0 (232674298), Rows_affected=0.0 (0), steve[steve]@localhost
select stevedb.ocpp_tag_activity.ocpp_tag_pk, parent.ocpp_tag_pk, stevedb.ocpp_tag_activity.id_tag, stevedb.ocpp_tag_activity.parent_id_tag, stevedb.ocpp_tag_activity.expiry_date, stevedb.ocpp_tag_activity.in_transaction, stevedb.ocpp_tag_activity.blocked, stevedb.ocpp_tag_activity.max_active_transaction_count, stevedb.ocpp_tag_activity.active_transaction_count, stevedb.ocpp_tag_activity.note from stevedb.ocpp_tag_activity left outer join stevedb.ocpp_tag_activity as parent on parent.id_tag = stevedb.ocpp_tag_activity.parent_id_tag where ((stevedb.ocpp_tag_activity.expiry_date is null or stevedb.ocpp_tag_activity.expiry_date > TIMESTAMP'2023-10-13 12:29:52') and stevedb.ocpp_tag_activity.blocked = 0)
We do have about 185 tags configured. Always results times out.
this query examines almost 223 million rows.
can you please report the row counts of transaction_start and transaction_stop tables?
MariaDB [stevedb]> select count() from transaction_start ; +----------+ | count() | +----------+ | 8891 | +----------+ 1 row in set (0.001 sec)
MariaDB [stevedb]> select count() from transaction_stop ; +----------+ | count() | +----------+ | 8900 | +----------+ 1 row in set (0.001 sec)
Potentially this may help
For the ocpp_tag table: CREATE INDEX idx_ocpp_tag_expiry_date ON ocpp_tag (expiry_date);
For the transaction_start table CREATE INDEX idx_transaction_start_connector_pk ON transaction_start (connector_pk); CREATE INDEX idx_transaction_start_id_tag ON transaction_start (id_tag); CREATE INDEX idx_transaction_start_start_timestamp ON transaction_start (start_timestamp);
we have these indexes already
can reproduce the slow query with my referenced code on MacBookPro 2021 with M1 Pro and 32GB memory.
preparation: insert 1000 tags, 500 chargeboxes, start 10K transactions and stop roughly half of them later.
the read query (the one used by ocpp tags page) takes 100 to 110 seconds.
i will look into this.
i think i found the issue.
the sql query of the current ocpp_tag_activity view is:
SELECT
ocpp_tag.*,
COALESCE(tx_activity.active_transaction_count, 0) as 'active_transaction_count',
CASE WHEN (active_transaction_count > 0) THEN TRUE ELSE FALSE END AS 'in_transaction'
FROM ocpp_tag
LEFT JOIN
(SELECT id_tag, count(id_tag) as 'active_transaction_count'
FROM transaction
WHERE stop_timestamp IS NULL
AND stop_value IS NULL
GROUP BY id_tag) tx_activity
ON ocpp_tag.id_tag = tx_activity.id_tag;
notice, i am left-joining ocpp_tag with a custom, temporary table that is derived from transaction that gives me id tags and their active_transaction_counts. i was aiming to derive the actual and to-the-point data i need in a temp table, and join with that. otherwise, the resulting joint table would have been too wide. however, i hypothesise that mysql cannot use indexes with this derived table (even though the underlying table transaction has them).
therefore, i made an inside-out action by joining ocpp_tag with barebones transaction and accepted the fact that the result will be a wide table. i group-by and filter on this resulting table. so, the new query becomes:
select `o`.*,
count(`t`.`id_tag`) AS `active_transaction_count`,
case when count(`t`.`id_tag`) > 0 then 1 else 0 end AS `in_transaction`,
case when `o`.`max_active_transaction_count` = 0 then 1 else 0 end AS `blocked`
from `ocpp_tag` `o` left join `transaction` `t` on (
`o`.`id_tag` = `t`.`id_tag` and
`t`.`stop_timestamp` is null and
`t`.`stop_value` is null)
group by
`o`.`ocpp_tag_pk`,
`o`.`id_tag`,
`o`.`parent_id_tag`,
`o`.`expiry_date`,
`o`.`max_active_transaction_count`,
`o`.`note`
this new version of the query returns results in less than 100 ms. i could not detect an altered behaviour with the new query either (meaning: it returns the same data set).
if you want to test, pls replace the view by executing:
CREATE OR REPLACE VIEW ocpp_tag_activity AS
select `o`.*,
count(`t`.`id_tag`) AS `active_transaction_count`,
case when count(`t`.`id_tag`) > 0 then 1 else 0 end AS `in_transaction`,
case when `o`.`max_active_transaction_count` = 0 then 1 else 0 end AS `blocked`
from `ocpp_tag` `o` left join `transaction` `t` on (
`o`.`id_tag` = `t`.`id_tag` and
`t`.`stop_timestamp` is null and
`t`.`stop_value` is null)
group by
`o`.`ocpp_tag_pk`,
`o`.`id_tag`,
`o`.`parent_id_tag`,
`o`.`expiry_date`,
`o`.`max_active_transaction_count`,
`o`.`note`
i will wait for your feedback, and then roll this out as a proper database migration.
@goekay: I'm curious how my solution (link in a previous comment) is doing on your test setup? Because on my dev-system with much less Tags (only 87) and transaction (1700+), the query from @kibeki with the new ocpp_tag_activity view was slower than the original (V0_9_9_update.sql). (I guess the new will much better scale with)
the query from @kibeki with the new ocpp_tag_activity view was slower than the original (V0_9_9_update.sql)
can you quantify this? i shared the code that you can use to benchmark.
Hi Goekay, now it is superfast. The "OCPP Tag Overview" loads as expected for our 186 tags. The results are shown directly after choosing the menu point. many thanks
the query from @kibeki with the new ocpp_tag_activity view was slower than the original (V0_9_9_update.sql)
can you quantify this? i shared the code that you can use to benchmark.
Have some trouble with your benchmark code, so for a quick reply here my setup: machine: Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz 8GB Ram mariadb 10.6.5; 87 ocpp tags; 1.878 transactions db management heidisql execute the query on heidisql: with occp_tag_activity (V0_9_9_update.sql): 0,297s with occp_tag_activity (new): 0,344s with occp_tag_activity (my version): 0,015s
Update: 38568 Transactions, 5000 OCPP_Tags, 2500 Chage boxes Then running the benchmark without the insert part for each ocpp_activity version twice: V0_9_9: PT 1.295S, PT 1.197S V1_0_4: PT 1.654S, PT 1.598S my solution: PT 1.029S, PT 0.890S
I'm not sure whether this will be considered a separate issue, so I'll comment here.
The same behavior is noticeable in the transaction view. We've noticed that, when the API call to get transaction details started taking up to 3 seconds. This issue persists on a production dataset with ~1000 transactions, with mariadb10.6, both in k8s on intel and in docker on M1.
The same trick as in https://github.com/steve-community/steve/issues/1219#issuecomment-1762370911 seems to work, without any changes in the result. Since ocpp_tag_activity references transaction, its performance also increased. In addition, for the current transaction declaration the access time is constant, but for the new declaration the result is cached and consecutive queries are quicker.
Old transaction declaration:
SELECT
tx1.transaction_pk, tx1.connector_pk, tx1.id_tag, tx1.event_timestamp as 'start_event_timestamp', tx1.start_timestamp, tx1.start_value,
tx2.event_actor as 'stop_event_actor', tx2.event_timestamp as 'stop_event_timestamp', tx2.stop_timestamp, tx2.stop_value, tx2.stop_reason
FROM transaction_start tx1
LEFT JOIN (
SELECT s1.*
FROM transaction_stop s1
WHERE s1.event_timestamp = (SELECT MAX(event_timestamp) FROM transaction_stop s2 WHERE s1.transaction_pk = s2.transaction_pk)
GROUP BY s1.transaction_pk, s1.event_timestamp) tx2
ON tx1.transaction_pk = tx2.transaction_pk;
Result 1051 rows in set (1.588 sec)
Updated transaction declaration:
SELECT
tx1.transaction_pk, tx1.connector_pk, tx1.id_tag, tx1.event_timestamp as 'start_event_timestamp', tx1.start_timestamp, tx1.start_value,
tx2.event_actor as 'stop_event_actor', tx2.event_timestamp as 'stop_event_timestamp', tx2.stop_timestamp, tx2.stop_value, tx2.stop_reason
FROM transaction_start tx1
LEFT JOIN transaction_stop tx2
ON tx1.transaction_pk = tx2.transaction_pk
AND tx2.event_timestamp = (SELECT MAX(event_timestamp) FROM transaction_stop s2 WHERE tx2.transaction_pk = s2.transaction_pk)
GROUP BY tx2.transaction_pk, tx2.event_timestamp;
Result 1051 rows in set (0.011 sec)
Comparison of SELECT * FROM ocpp_tag_activity;
Old ocpp_tag_activity, old transaction: 1.606 sec
Old ocpp_tag_activity, new transaction: 0.026 sec
New ocpp_tag_activity, old transaction: 1.581 sec
New ocpp_tag_activity, new transaction: 0.021 sec
Note that this is not an exhaustive test, and the fix for ocpp_tag_activity proved very beneficial in other requests, so I believe both views should be updated to exclude the JOIN SELECT construction, as it's very sub-optimal. Thanks for consideration.
@NikSays thanks for your investigation, discovery and bringing this topic back to my attention. i have created the PR for the previous improvement and will integrate your suggestion as well.
@NikSays i think there is an issue with your updated transaction:
as a consequence of the change, you start applying GROUP BY tx2.transaction_pk, tx2.event_timestamp; on the joined/global table which does not handle open/active transactions properly. in fact, they are being filtered out because tx2.event_timestamp will be NULL for them.
the group-by has unwanted side effects when taken one level higher (to the joined/global table) because it starts to act as a filter on the right side of the LEFT JOIN since we have entries with valued tx1.* fields but NULL tx2.* fields. therefore, it starts grouping around (NULL, NULL) which effectively includes only the first active transaction in the result set.
actually, i realize that the whole GROUP BY tx2.transaction_pk, tx2.event_timestamp can be dropped, because there is no need for it. the following query
SELECT
tx1.transaction_pk, tx1.connector_pk, tx1.id_tag, tx1.event_timestamp as 'start_event_timestamp', tx1.start_timestamp, tx1.start_value,
tx2.event_actor as 'stop_event_actor', tx2.event_timestamp as 'stop_event_timestamp', tx2.stop_timestamp, tx2.stop_value, tx2.stop_reason
FROM transaction_start tx1
LEFT JOIN transaction_stop tx2
ON tx1.transaction_pk = tx2.transaction_pk
AND tx2.event_timestamp = (SELECT MAX(event_timestamp) FROM transaction_stop s2 WHERE tx2.transaction_pk = s2.transaction_pk)
should be good enough for the view. the join with transaction_stop using the transaction_pk and latest event_timestamp for this transaction_pk contains enough coordinates to select the latest stop event.
WDYT?