matomo
matomo copied to clipboard
specify index hint for queryConversionsByPageView to optimize LogAggregator Performance
Description:
Currently the LogAggregator might be very slow in large environments of with millions of visitors each day. This mainly comes back to one sql query, which lacks of proper index hints. Settings this optimized the performance from over two weeks per execution, to like 0.9 seconds.
Review
- [ ] Functional review done
- [ ] Potential edge cases thought about (behavior of the code with strange input, with strange internal state or possible interactions with other Matomo subsystems) => INDEX ist supported by MySQL + MariaDB and available for all supported version. The variant used in Matomo 5.0 dev branchs DBHelper (JOIN_PREFIX) would break on mariadb systems.
- [ ] Usability review done (is anything maybe unclear or think about anything that would cause people to reach out to support)
- [ ] Security review done
- [ ] Wording review done
- [ ] Code review done
- [ ] Tests were added if useful/possible
- [ ] Reviewed for breaking changes
- [ ] Developer changelog updated if needed
- [ ] Documentation added if needed
- [ ] Existing documentation updated if needed
@Fabian123333 thank you for opening the PR. As you created the PR from 4.x branch, can you please confirm that it is not an issue on 5.x or it doesn't apply there or anything like that?
Could you also provide more information on the system you experienced the speed up on, e.g. was it MariaDb or MySQL server? What versions? It would help us replicate/confirm the performance increase. Thanks!
@Fabian123333 thank you for opening the PR. As you created the PR from 4.x branch, can you please confirm that it is not an issue on 5.x or it doesn't apply there or anything like that?
On the 5.x branch the DataAccess has been refactored, offering a separat addJoinPrefixHintToQuery method to support Index hints in the DbHelper Class: https://github.com/matomo-org/matomo/blob/5.x-dev/core/DbHelper.php#L365
This uses JOIN_PREFIX, fully supported by MySQL 8, but breaks compatibility with MariaDB versions (as USE INDEX
is the platform independent version for optimizer hints). So this possibly would need refactoring if the optimization should also work for MariaDB, but that's another topic I might take a look on the next days.
Could you also provide more information on the system you experienced the speed up on, e.g. was it MariaDb or MySQL server? What versions? It would help us replicate/confirm the performance increase. Thanks!
The performance benefits have been measured on MariaDB 10.5 (build version 10.5.21-MariaDB-0+deb11u1).
The produced queries looked as follows:
SELECT /*+ MAX_EXECUTION_TIME(3600000) */
yyy.idvisit AS idvisit,
2 AS idgoal,
1 AS `type`,
yyy.idaction AS idaction,
COUNT(*) AS `1`,
ROUND(SUM(yyy.revenue_total),2) AS `2`,
COUNT(yyy.idvisit) AS `3`,
ROUND(SUM(yyy.revenue_subtotal),2) AS `4`,
ROUND(SUM(yyy.revenue_tax),2) AS `5`,
ROUND(SUM(yyy.revenue_shipping),2) AS `6`,
ROUND(SUM(yyy.revenue_discount),2) AS `7`,
SUM(yyy.items) AS `8`,
yyy.pages_before AS `9`,
SUM(yyy.attribution) AS `10`,
COUNT(*) AS `12`,
ROUND(SUM(yyy.revenue),2) AS `15`
FROM (
SELECT
num_total AS pages_before,
1 / num_total AS attribution,
r.idvisit AS idvisit,
lac.idaction AS idaction,
lvcon.revenue AS revenue_total,
1 / num_total * lvcon.revenue AS revenue,
1 / num_total * lvcon.revenue_subtotal AS revenue_subtotal,
1 / num_total * lvcon.revenue_tax AS revenue_tax,
1 / num_total * lvcon.revenue_shipping AS revenue_shipping,
1 / num_total * lvcon.revenue_discount AS revenue_discount,
1 / num_total * lvcon.items AS items
FROM (
SELECT /* sites 9 */ /* 2023-12-14,2023-12-15 */ /* Actions */ /* trigger = CronArchive */
log_conversion.idvisit, COUNT(*) AS num_total
FROM
matomo_log_conversion AS log_conversion RIGHT JOIN matomo_log_link_visit_action AS log_vpast ON log_conversion.idvisit = log_vpast.idvisit LEFT JOIN matomo_log_action AS lac_past ON log_vpast.idaction_url = lac_past.idaction
WHERE
log_conversion.server_time >= '2023-12-14 23:00:00'
AND log_conversion.server_time <= '2023-12-15 22:59:59'
AND log_conversion.idsite IN ('9')AND log_conversion.idgoal = 2
AND log_vpast.server_time <= log_conversion.server_time
AND lac_past.type = 1
GROUP BY
log_conversion.idvisit
) AS r
LEFT JOIN matomo_log_conversion lvcon ON lvcon.idgoal = 2 AND lvcon.idvisit = r.idvisit
RIGHT JOIN matomo_log_link_visit_action logv USE INDEX (index_idvisit) ON logv.idvisit = r.idvisit
LEFT JOIN matomo_log_action lac ON logv.idaction_url = lac.idaction
WHERE logv.server_time >= '2023-12-14 23:00:00'
AND logv.server_time <= '2023-12-15 22:59:59'
AND logv.idsite IN (9)
AND lac.type = 1
AND logv.server_time <= lvcon.server_time
) AS yyy
GROUP BY yyy.idaction
ORDER BY `9` DESC
Resulting in a runtime of 0.909sec, compared to like 14 days of runtime without the optimizer hints.
This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers
This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers
This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers
This PR was last updated more than one month ago, maybe it's time to close it. Please check if there is anything we still can do or close this PR. ping @matomo-org/core-reviewers