organic-search-analytics
organic-search-analytics copied to clipboard
Add indexes to database tables
Specially important for handling large amounts of rows in search_analytics
.
Definitely index for date
, maybe another combined for date
, domain
and search_engine
for queries such as (inc/code/dataCapture.php
):
$query = "SELECT COUNT( DISTINCT date ) AS record, date FROM ".MySQL::DB_TABLE_SEARCH_ANALYTICS." WHERE search_engine = 'google' AND domain LIKE '".$website."'";
etc.
Thank you for the suggestion @nekromoff. I love the idea of this and recognize it's need.
That said, I'll admit I'm not overly familiar with how index work and how to best configure them.
Would you be willing to commit a pull request with the required changes? Do you know of any good resources to help me better understand indexes, how to set them up, and how they operate?
Thank you.
Those times when I used to optimize MySQL performance are gone for me, but few basic rules:
- Indexes slow down inserting
- Indexes speed up reading / searching (using where clause)
- Combined indexes must be used left to right, e.g. index on columns
date, domain, search_engine
mean it will be used only in query using date, domain, search_engine in this order, e.g.where date=... and domain=...
will work, index for querywhere domain=... and search_engine=...
won't be used (missing the utmost left parameter)
Also: prepend any SELECT query with EXPLAIN
and it will tell you, if any indexes are used.
More on the topic: https://stackoverflow.com/questions/3049283/mysql-indexes-what-are-the-best-practices https://www.slideshare.net/billkarwin/how-to-design-indexes-really
I have currently only tried to add these indexes for search_analytics
table, but without testing their impact on performance, yet,:
INDEX date
INDEX domain
INDEX domain, date, search_engine
BTW, yes, I can try to add indexes to the .sql
file to create database. But maybe you know better what SELECT
queries and in what WHERE
order are used.
This is the primary select query that is used on all of the reports. Report Select
Variables for the query get populated in the Reports class.
Rather than try an muddle through the code, I'd suggest adding echo $reportQuery;
on this line and then run a report. You'll see the query used for that report displayed.
search_analytics seems like the only table worth indexing, but search_engine isn't used in reports, and is therefore not required.
Since all of the queries use domain and a range of dates, I've changed the primary key to "domain, date, id" and converted the id index to unique. This way, that table is physically sorted by domain and date, ready for most queries.
Sure enough, this index is used whenever the query and/or page match is set to "broad", and speed seems to be better than before, regardless of the query type. On 344MB of data, the index size is 60MB, which is roughly a 17% overhead.
Adding indexes on domain, date and query or domain, date and page doesn't help, but when indexing on substrings, e.g. domain(50), date and query(255) OR page(255), they are used for the exact match queries, but not for broad matches. Therefore, the benefits are smaller.
I've also converted the table to InnoDB storage, because I think it's faster and because it doesn't need to be maintained (analysed/repaired).
Suggested SQL for this:
CREATE TABLE `search_analytics` ( `id` int(11) NOT NULL, `domain` varchar(256) NOT NULL, `date` date NOT NULL, `search_engine` varchar(50) NOT NULL, `search_type` varchar(24) DEFAULT NULL, `device_type` varchar(24) DEFAULT NULL, `country` varchar(10) DEFAULT NULL, `query` varchar(500) DEFAULT NULL, `page` varchar(500) DEFAULT NULL, `impressions` int(11) NOT NULL, `clicks` int(11) NOT NULL, `ctr` float NOT NULL, `avg_position` float NOT NULL, `avg_position_click` float DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;
ALTER TABLE `search_analytics` ADD PRIMARY KEY (`domain`(255),`date`,`id`) USING BTREE, ADD UNIQUE KEY `id_index` (`id`) USING BTREE;
ALTER TABLE `search_analytics` MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;