elasticsuite
elasticsuite copied to clipboard
Extremely slow reindexing when using a MariaDb platform.
The issue only occurs when using a mariaDb platform, and I've found that it only happens with catalog_product_entity_int reindexing in the catalogsearch_fulltext indexer.
Because it works in a loop, the effective sync speeds take over an hour to finish in our production server, meanwhile are done in under a minute in a local machine.
Preconditions
MariaDB: 10.2.26
Magento Version : 2.3.2
ElasticSuite Version : 2.8.1/2.8.3
Steps to reproduce
- Run
$ bin/magento indexer:reindex catalogsearch_fulltext - If using mariaDb 10.2.26 grab some popcorn
Expected result
- Normal EXPLAIN when using a MySql 5.7 system:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | entity | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 1000 | 100.00 | Using where; Using index |
| 1 | SIMPLE | t_default | NULL | ref | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 4 | rekaubamaja.entity.entity_id | 21 | 39.13 | Using index condition |
| 1 | SIMPLE | t_store | NULL | eq_ref | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 8 | magento_table.entity.entity_id,magento_table.t_default.attribute_id,const | 1 | 100.00 | NULL |
3 rows (0.008 s),
Actual result
- Using MariaDb 10.2.* system:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | entity | range | PRIMARY | PRIMARY | 4 | NULL | 1000 | 1000.00 | 100.00 | 100.00 | Using where; Using index |
| 1 | SIMPLE | t_default | range | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 8 | NULL | 8000 | 3017.00 | 0.12 | 0.10 | Using index condition; Using where |
| 1 | SIMPLE | t_store | eq_ref | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 8 | magento_table.entity.entity_id,magento_table.t_default.attribute_id,const | 1 | 0.66 | 100.00 | 100.00 |
3 rows (16.280 s),
Analysis
The problem lies somewhat in MariaDb failing to use the primary key index instead of the unique one in catalog_product_entity_int table, but I think the main issue is in the non optimal SQL query executed.
\Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Fulltext\Datasource\AbstractAttributeData::getAttributesRawData finds attributes using the entity_id's, and for some reason selects them from the main table first only to join the entity tables by the entity_id value.
I've fixed our problem by ditching the Inner join done here and selecting all Id's straight from the t_default (catalog_product_entity_int) table.
So instead of doing ...
$select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
->joinInner(
['t_default' => $tableName],
new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}"),
['attribute_id']
)
->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
->where('t_default.store_id=?', 0)
->where('t_default.attribute_id IN (?)', $attributeIds)
->where("entity.{$entityIdField} IN (?)", $entityIds)
->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
... the query works perfectly well written as:
$select->from(['t_default' => $tableName], [$entityIdField])
->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
->where('t_default.store_id=?', 0)
->where('t_default.attribute_id IN (?)', $attributeIds)
->where("t_default.{$entityIdField} IN (?)", $entityIds)
->columns(['t_default.attribute_id', 'value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
I would suggest fixing this query :)
@reviskar for me your query will not work under a Magento Commerce because you get rid of the new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}") which is the only way to properly match the list of $entityIds when using Magento Commerce (and having the staging modules enabled) because there is no entity_id column in the catalog_product_entity_* tables.
True, did not see that as a possibility. We have entity_id fields in our entity table so the fix works well for us at the moment. But as to solving the issue, I'm not sure what the best approach would be here. I would imagine it's doable either way without the double-join.
I'm not sure the double-join can be avoided because catalog_product_entity table is the only location where you can fetch the row_id <-> entity_id mapping if using Magento Commerce.
@reviskar do you have any new insights on this one ? did you try to upgrade MariaDB to another version ?
I saw some issues on the MariaDB tracker that looks like yours : https://jira.mariadb.org/browse/MDEV-15339?attachmentViewMode=list
@romainruaud Thanks for asking. Seems to be exactly the issue we had. I've solved it as stated above, as it works for us very well and we had limited resources to solve it at that time.
@romainruaud our service provider will upgrade our staging and production servers to MariaDb 10.4 in a few days time. So we will find out soon enough, if the newer version is smarter in this regard.
Great, thank you for keeping us in touch about this (very annoying I admit it) issue.
Regards
I recently upraded MariaDB to 10.4 (as it is supported by Magento 2.4.1) and found I had to disable an optimizer_switch or indexation would be very slow when part of the result set was empty. This impacts more than just Elasticsuite (Magento itself also suffers), just figured I'd share what I found.
Rowid_filter is new in MariaDB 10.4, but is about 100 times slower when result set is empty, causing indexation to be 10 times slower in my case (since I use a lot of global attributes, so store view values are empty)
set global optimizer_switch='rowid_filter=off';
For my.cnf
optimizer_switch=rowid_filter=off
EDIT: Looks like that made it into the Magento documentation!
https://devdocs.magento.com/guides/v2.4/performance-best-practices/configuration.html#indexers
There's also this: https://github.com/magento/magento2/pull/27129 to consider and potentially patch here.
Improves index times even further.
Hi @Quazz , thank you for the update.
On our side, we've had good results so far on a MariaDB 10.3 with the following parameters :
in_predicate_conversion_threshold = 0
optimizer_switch = 'optimize_join_buffer_size=on'
@romainruaud Thank you for those. I believe 'optimize_join_buffer_size=on' is default from 10.4.3 onwards now.
Hi, we're also still facing to this problem with the following configuration:
- Percona 5.7.33-36
- Magento 2.3.4
- elasticsuite 2.8.8
Problem
Profiling:
getAttributesRawData take about 40sec

EXPLAIN of the query shows missing index usage for EAV tables:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | entity | NULL | range | CATALOG_PRODUCT_ENTITY_CREATED_IN,CATALOG_PRODUCT_ENTITY_UPDATED_IN,CATALOG_PRODUCT_ENTITY_ENTITY_ID_CREATED_IN_UPDATED_IN | CATALOG_PRODUCT_ENTITY_ENTITY_ID_CREATED_IN_UPDATED_IN | 12 | NULL | 323 | 100 | Using where; Using index |
| 1 | SIMPLE | attr | NULL | index | PRIMARY | EAV_ATTRIBUTE_PIM_ID | 195 | NULL | 1017 | 42.38 | Using where; Using index; Using join buffer (Block Nested Loop) |
| 1 | SIMPLE | t_default | NULL | eq_ref | CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_STORE_ID | CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID | 8 | magento.entity.row_id,magento.attr.attribute_id,const | 1 | 100 | NULL |
| 1 | SIMPLE | t_store | NULL | eq_ref | CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_STORE_ID | CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID | 8 | magento.entity.row_id,magento.attr.attribute_id,const | 1 | 100 | NULL |
Possible solution
I don't found a performant variant of this query that also uses indexes, so I decide to split the query into two parts and join the result back in php which increase the speed.
After the patch getAttributesRawData take about 4sec

I hope the following patch works like the intended behavior of the original implementation:
- filter out
nullvalues - store values will override default values
The following patch is currently not tested in production:
Index: a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php b/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php
--- a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php (date 1615973628940)
+++ b/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php (date 1615973628940)
@@ -16,6 +16,7 @@
use Magento\Framework\App\ResourceConnection;
use Magento\Framework\EntityManager\MetadataPool;
+use Magento\Store\Model\Store;
use Magento\Store\Model\StoreManagerInterface;
use Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Indexer;
use Magento\Eav\Model\ResourceModel\Entity\Attribute\Collection as AttributeCollection;
@@ -104,49 +105,45 @@
*/
public function getAttributesRawData($storeId, array $entityIds, $tableName, array $attributeIds)
{
- $select = $this->connection->select();
-
// The field modelizing the link between entity table and attribute values table. Either row_id or entity_id.
$linkField = $this->getEntityMetaData($this->getEntityTypeId())->getLinkField();
// The legacy entity_id field.
$entityIdField = $this->getEntityMetaData($this->getEntityTypeId())->getIdentifierField();
- $joinDefaultValuesCondition = [
- new \Zend_Db_Expr("entity.$linkField = t_default.$linkField"),
- 't_default.attribute_id = attr.attribute_id',
- $this->connection->quoteInto('t_default.store_id = ?', \Magento\Store\Model\Store::DEFAULT_STORE_ID),
+ //Define store related conditions, keep the order of the array elements!
+ $storeConditions = [
+ 'default' => $this->connection->quoteInto('t_attribute.store_id = ?', Store::DEFAULT_STORE_ID),
+ 'store' => $this->connection->quoteInto('t_attribute.store_id = ?', $storeId),
];
- $joinDefaultValuesCondition = implode(' AND ', $joinDefaultValuesCondition);
- $joinStoreValuesConditionClauses = [
- new \Zend_Db_Expr("entity.$linkField = t_store.$linkField"),
- 't_store.attribute_id = attr.attribute_id',
- $this->connection->quoteInto('t_store.store_id = ?', $storeId),
- ];
- $joinStoreValuesCondition = implode(' AND ', $joinStoreValuesConditionClauses);
+ $result = [];
+ foreach ($storeConditions as $condition) {
+ $joinAttributeValuesCondition = [
+ new \Zend_Db_Expr("entity.$linkField = t_attribute.$linkField"),
+ $condition
+ ];
+ $joinAttributeValuesCondition = implode(' AND ', $joinAttributeValuesCondition);
- $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
- ->joinInner(
- ['attr' => $this->getTable('eav_attribute')],
- $this->connection->quoteInto('attr.attribute_id IN (?)', $attributeIds),
- ['attribute_id']
- )
- ->joinLeft(
- ['t_default' => $tableName],
- $joinDefaultValuesCondition,
- []
- )
- ->joinLeft(
- ['t_store' => $tableName],
- $joinStoreValuesCondition,
- []
- )
- ->where("entity.{$entityIdField} IN (?)", $entityIds)
- ->having('value IS NOT NULL')
- ->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
+ $select = $this->connection->select();
+ $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
+ ->joinLeft(
+ ['t_attribute' => $tableName],
+ $joinAttributeValuesCondition,
+ ['attribute_id', 'value']
+ )
+ ->where("entity.{$entityIdField} IN (?)", $entityIds)
+ ->where("t_attribute.attribute_id IN (?)", $attributeIds)
+ ->where("t_attribute.value IS NOT NULL");
- return $this->connection->fetchAll($select);
+ //Get the result and override values from a previous loop
+ foreach ($this->connection->fetchAll($select) as $row) {
+ $key = "{$row['entity_id']}-{$row['attribute_id']}";
+ $result[$key] = $row;
+ }
+ }
+
+ return array_values($result);
}
/**
Feel free to merge the patch into the core or leave it as optional patch for huge catalogs.
About 15-25% improvement (for the full catalogsearch reindex) on a 125K ish catalog with 3 store views in a quick test. Can't fully speak to whether the results are the same, though the number of documents in the index seems to match at least.
Wow, good catch, we'll have to test it with caution, especially with Magento Enterprise where things are handled differently (row_id/entity_id).
@Wohlie did you test it in production ?
By the way, are you using Magento Commerce or Open Source ?
Hi @romainruaud, we are using Magento Commerce 2.3.4. We are currently not live. Go live for this project is end of May. This patch is included since mid-March in our code base.
We don't test the patch for Magento open source but I re-use the original code for this patch. So the chance is high, it will also work with Magneto open source.
@Wohlie
how time flies :)
Are you still using this patch on production ? If yes, that's probably enough tested now :)
Regards
Hi @romainruaud, yes we still use this patch in production. :)
Ok so this could probably be integrated into the core if this does help that much with performances.
I'll prioritize this.
Regards