organic-search-analytics icon indicating copy to clipboard operation
organic-search-analytics copied to clipboard

Add indexes to database tables

Open nekromoff opened this issue 7 years ago • 5 comments

Specially important for handling large amounts of rows in search_analytics.

Definitely index for date, maybe another combined for date, domain and search_engine for queries such as (inc/code/dataCapture.php):

$query = "SELECT COUNT( DISTINCT date ) AS record, date FROM ".MySQL::DB_TABLE_SEARCH_ANALYTICS." WHERE search_engine = 'google' AND domain LIKE '".$website."'";

etc.

nekromoff avatar Apr 09 '17 20:04 nekromoff

Thank you for the suggestion @nekromoff. I love the idea of this and recognize it's need.

That said, I'll admit I'm not overly familiar with how index work and how to best configure them.

Would you be willing to commit a pull request with the required changes? Do you know of any good resources to help me better understand indexes, how to set them up, and how they operate?

Thank you.

PromInc avatar Apr 10 '17 01:04 PromInc

Those times when I used to optimize MySQL performance are gone for me, but few basic rules:

  • Indexes slow down inserting
  • Indexes speed up reading / searching (using where clause)
  • Combined indexes must be used left to right, e.g. index on columns date, domain, search_engine mean it will be used only in query using date, domain, search_engine in this order, e.g. where date=... and domain=... will work, index for query where domain=... and search_engine=... won't be used (missing the utmost left parameter)

Also: prepend any SELECT query with EXPLAIN and it will tell you, if any indexes are used.

More on the topic: https://stackoverflow.com/questions/3049283/mysql-indexes-what-are-the-best-practices https://www.slideshare.net/billkarwin/how-to-design-indexes-really

I have currently only tried to add these indexes for search_analytics table, but without testing their impact on performance, yet,:

INDEX	date
INDEX	domain
INDEX	domain, date, search_engine

nekromoff avatar Apr 10 '17 08:04 nekromoff

BTW, yes, I can try to add indexes to the .sql file to create database. But maybe you know better what SELECT queries and in what WHERE order are used.

nekromoff avatar Apr 10 '17 08:04 nekromoff

This is the primary select query that is used on all of the reports. Report Select

Variables for the query get populated in the Reports class.

Rather than try an muddle through the code, I'd suggest adding echo $reportQuery; on this line and then run a report. You'll see the query used for that report displayed.

PromInc avatar Apr 10 '17 10:04 PromInc

search_analytics seems like the only table worth indexing, but search_engine isn't used in reports, and is therefore not required.

Since all of the queries use domain and a range of dates, I've changed the primary key to "domain, date, id" and converted the id index to unique. This way, that table is physically sorted by domain and date, ready for most queries.

Sure enough, this index is used whenever the query and/or page match is set to "broad", and speed seems to be better than before, regardless of the query type. On 344MB of data, the index size is 60MB, which is roughly a 17% overhead.

Adding indexes on domain, date and query or domain, date and page doesn't help, but when indexing on substrings, e.g. domain(50), date and query(255) OR page(255), they are used for the exact match queries, but not for broad matches. Therefore, the benefits are smaller.

I've also converted the table to InnoDB storage, because I think it's faster and because it doesn't need to be maintained (analysed/repaired).

Suggested SQL for this:

CREATE TABLE `search_analytics` ( `id` int(11) NOT NULL, `domain` varchar(256) NOT NULL, `date` date NOT NULL, `search_engine` varchar(50) NOT NULL, `search_type` varchar(24) DEFAULT NULL, `device_type` varchar(24) DEFAULT NULL, `country` varchar(10) DEFAULT NULL, `query` varchar(500) DEFAULT NULL, `page` varchar(500) DEFAULT NULL, `impressions` int(11) NOT NULL, `clicks` int(11) NOT NULL, `ctr` float NOT NULL, `avg_position` float NOT NULL, `avg_position_click` float DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=utf8;

ALTER TABLE `search_analytics` ADD PRIMARY KEY (`domain`(255),`date`,`id`) USING BTREE, ADD UNIQUE KEY `id_index` (`id`) USING BTREE;

ALTER TABLE `search_analytics` MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;

galbaras avatar Jun 07 '17 08:06 galbaras