matomo icon indicating copy to clipboard operation
matomo copied to clipboard

specify index hint for queryConversionsByPageView to optimize LogAggregator Performance

Open Fabian123333 opened this issue 1 year ago • 6 comments

Description:

Currently the LogAggregator might be very slow in large environments of with millions of visitors each day. This mainly comes back to one sql query, which lacks of proper index hints. Settings this optimized the performance from over two weeks per execution, to like 0.9 seconds.

Review

Fabian123333 avatar Dec 15 '23 15:12 Fabian123333

@Fabian123333 thank you for opening the PR. As you created the PR from 4.x branch, can you please confirm that it is not an issue on 5.x or it doesn't apply there or anything like that?

michalkleiner avatar Dec 16 '23 09:12 michalkleiner

Could you also provide more information on the system you experienced the speed up on, e.g. was it MariaDb or MySQL server? What versions? It would help us replicate/confirm the performance increase. Thanks!

michalkleiner avatar Dec 16 '23 09:12 michalkleiner

@Fabian123333 thank you for opening the PR. As you created the PR from 4.x branch, can you please confirm that it is not an issue on 5.x or it doesn't apply there or anything like that?

On the 5.x branch the DataAccess has been refactored, offering a separat addJoinPrefixHintToQuery method to support Index hints in the DbHelper Class: https://github.com/matomo-org/matomo/blob/5.x-dev/core/DbHelper.php#L365

This uses JOIN_PREFIX, fully supported by MySQL 8, but breaks compatibility with MariaDB versions (as USE INDEX is the platform independent version for optimizer hints). So this possibly would need refactoring if the optimization should also work for MariaDB, but that's another topic I might take a look on the next days.

Could you also provide more information on the system you experienced the speed up on, e.g. was it MariaDb or MySQL server? What versions? It would help us replicate/confirm the performance increase. Thanks!

The performance benefits have been measured on MariaDB 10.5 (build version 10.5.21-MariaDB-0+deb11u1).

The produced queries looked as follows:

SELECT  /*+ MAX_EXECUTION_TIME(3600000) */ 
          yyy.idvisit AS idvisit,
          2 AS idgoal,
          1 AS `type`,
          yyy.idaction AS idaction,
          COUNT(*) AS `1`,     
          ROUND(SUM(yyy.revenue_total),2) AS `2`,          
          COUNT(yyy.idvisit) AS `3`,                   
          ROUND(SUM(yyy.revenue_subtotal),2) AS `4`,                 
          ROUND(SUM(yyy.revenue_tax),2) AS `5`,
          ROUND(SUM(yyy.revenue_shipping),2) AS `6`,
          ROUND(SUM(yyy.revenue_discount),2) AS `7`,
          SUM(yyy.items) AS `8`,
          yyy.pages_before AS `9`,                    
          SUM(yyy.attribution) AS `10`,                    
          COUNT(*) AS `12`,              
          ROUND(SUM(yyy.revenue),2) AS `15`         
        FROM (
          SELECT
            num_total AS pages_before,
            1 / num_total AS attribution,
            r.idvisit AS idvisit,
            lac.idaction AS idaction,
            lvcon.revenue AS revenue_total,
            1 / num_total * lvcon.revenue AS revenue,
            1 / num_total * lvcon.revenue_subtotal AS revenue_subtotal,
            1 / num_total * lvcon.revenue_tax AS revenue_tax,
            1 / num_total * lvcon.revenue_shipping AS revenue_shipping,
            1 / num_total * lvcon.revenue_discount AS revenue_discount,
            1 / num_total * lvcon.items AS items
          FROM (
            SELECT /* sites 9 */ /* 2023-12-14,2023-12-15 */ /* Actions */ /* trigger = CronArchive */
				log_conversion.idvisit, COUNT(*) AS num_total
			FROM
				matomo_log_conversion AS log_conversion RIGHT JOIN matomo_log_link_visit_action AS log_vpast ON log_conversion.idvisit = log_vpast.idvisit LEFT JOIN matomo_log_action AS lac_past ON log_vpast.idaction_url = lac_past.idaction
			WHERE
				log_conversion.server_time >= '2023-12-14 23:00:00'
				AND log_conversion.server_time <= '2023-12-15 22:59:59'
				AND log_conversion.idsite IN ('9')AND log_conversion.idgoal = 2 
                          AND log_vpast.server_time <= log_conversion.server_time
                          AND lac_past.type = 1
			GROUP BY
				log_conversion.idvisit
          ) AS r
          LEFT JOIN matomo_log_conversion lvcon ON lvcon.idgoal = 2 AND lvcon.idvisit = r.idvisit
          RIGHT JOIN matomo_log_link_visit_action logv USE INDEX (index_idvisit) ON logv.idvisit = r.idvisit
          LEFT JOIN matomo_log_action lac ON logv.idaction_url = lac.idaction
          WHERE logv.server_time >= '2023-12-14 23:00:00'
            AND logv.server_time <= '2023-12-15 22:59:59'
            AND logv.idsite IN (9) 
            AND lac.type = 1
            AND logv.server_time <= lvcon.server_time
          ) AS yyy
        GROUP BY yyy.idaction
        ORDER BY `9` DESC

Resulting in a runtime of 0.909sec, compared to like 14 days of runtime without the optimizer hints.

Fabian123333 avatar Dec 16 '23 17:12 Fabian123333

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

github-actions[bot] avatar Dec 26 '23 01:12 github-actions[bot]

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

github-actions[bot] avatar Jan 04 '24 01:01 github-actions[bot]

This issue is in "needs review" but there has been no activity for 7 days. ping @matomo-org/core-reviewers

github-actions[bot] avatar Jan 29 '24 01:01 github-actions[bot]

This PR was last updated more than one month ago, maybe it's time to close it. Please check if there is anything we still can do or close this PR. ping @matomo-org/core-reviewers

github-actions[bot] avatar Mar 11 '24 01:03 github-actions[bot]