elasticsuite icon indicating copy to clipboard operation
elasticsuite copied to clipboard

Extremely slow reindexing when using a MariaDb platform.

Open reviskar opened this issue 6 years ago • 16 comments

The issue only occurs when using a mariaDb platform, and I've found that it only happens with catalog_product_entity_int reindexing in the catalogsearch_fulltext indexer. Because it works in a loop, the effective sync speeds take over an hour to finish in our production server, meanwhile are done in under a minute in a local machine.

Preconditions

MariaDB: 10.2.26

Magento Version : 2.3.2

ElasticSuite Version : 2.8.1/2.8.3

Steps to reproduce

  1. Run $ bin/magento indexer:reindex catalogsearch_fulltext
  2. If using mariaDb 10.2.26 grab some popcorn

Expected result

  1. Normal EXPLAIN when using a MySql 5.7 system:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE entity NULL range PRIMARY PRIMARY 4 NULL 1000 100.00 Using where; Using index
1 SIMPLE t_default NULL ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 4 rekaubamaja.entity.entity_id 21 39.13 Using index condition
1 SIMPLE t_store NULL eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 magento_table.entity.entity_id,magento_table.t_default.attribute_id,const 1 100.00 NULL

3 rows (0.008 s),

Actual result

  1. Using MariaDb 10.2.* system:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE entity range PRIMARY PRIMARY 4 NULL 1000 1000.00 100.00 100.00 Using where; Using index
1 SIMPLE t_default range CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 NULL 8000 3017.00 0.12 0.10 Using index condition; Using where
1 SIMPLE t_store eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 magento_table.entity.entity_id,magento_table.t_default.attribute_id,const 1 0.66 100.00 100.00  

3 rows (16.280 s),

Analysis

The problem lies somewhat in MariaDb failing to use the primary key index instead of the unique one in catalog_product_entity_int table, but I think the main issue is in the non optimal SQL query executed.

\Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Fulltext\Datasource\AbstractAttributeData::getAttributesRawData finds attributes using the entity_id's, and for some reason selects them from the main table first only to join the entity tables by the entity_id value.

I've fixed our problem by ditching the Inner join done here and selecting all Id's straight from the t_default (catalog_product_entity_int) table.

So instead of doing ...

        $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
            ->joinInner(
                ['t_default' => $tableName],
                new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}"),
                ['attribute_id']
            )
            ->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
            ->where('t_default.store_id=?', 0)
            ->where('t_default.attribute_id IN (?)', $attributeIds)
            ->where("entity.{$entityIdField} IN (?)", $entityIds)
            ->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);

... the query works perfectly well written as:

        $select->from(['t_default' => $tableName], [$entityIdField])
            ->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
            ->where('t_default.store_id=?', 0)
            ->where('t_default.attribute_id IN (?)', $attributeIds)
            ->where("t_default.{$entityIdField} IN (?)", $entityIds)
            ->columns(['t_default.attribute_id', 'value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);

I would suggest fixing this query :)

reviskar avatar Nov 15 '19 07:11 reviskar

@reviskar for me your query will not work under a Magento Commerce because you get rid of the new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}") which is the only way to properly match the list of $entityIds when using Magento Commerce (and having the staging modules enabled) because there is no entity_id column in the catalog_product_entity_* tables.

romainruaud avatar Nov 18 '19 11:11 romainruaud

True, did not see that as a possibility. We have entity_id fields in our entity table so the fix works well for us at the moment. But as to solving the issue, I'm not sure what the best approach would be here. I would imagine it's doable either way without the double-join.

reviskar avatar Nov 18 '19 11:11 reviskar

I'm not sure the double-join can be avoided because catalog_product_entity table is the only location where you can fetch the row_id <-> entity_id mapping if using Magento Commerce.

romainruaud avatar Nov 18 '19 12:11 romainruaud

@reviskar do you have any new insights on this one ? did you try to upgrade MariaDB to another version ?

I saw some issues on the MariaDB tracker that looks like yours : https://jira.mariadb.org/browse/MDEV-15339?attachmentViewMode=list

romainruaud avatar Feb 12 '20 10:02 romainruaud

@romainruaud Thanks for asking. Seems to be exactly the issue we had. I've solved it as stated above, as it works for us very well and we had limited resources to solve it at that time.

reviskar avatar Feb 17 '20 07:02 reviskar

@romainruaud our service provider will upgrade our staging and production servers to MariaDb 10.4 in a few days time. So we will find out soon enough, if the newer version is smarter in this regard.

reviskar avatar Feb 27 '20 06:02 reviskar

Great, thank you for keeping us in touch about this (very annoying I admit it) issue.

Regards

romainruaud avatar Feb 27 '20 07:02 romainruaud

I recently upraded MariaDB to 10.4 (as it is supported by Magento 2.4.1) and found I had to disable an optimizer_switch or indexation would be very slow when part of the result set was empty. This impacts more than just Elasticsuite (Magento itself also suffers), just figured I'd share what I found.

Rowid_filter is new in MariaDB 10.4, but is about 100 times slower when result set is empty, causing indexation to be 10 times slower in my case (since I use a lot of global attributes, so store view values are empty) set global optimizer_switch='rowid_filter=off';

For my.cnf optimizer_switch=rowid_filter=off

EDIT: Looks like that made it into the Magento documentation!

https://devdocs.magento.com/guides/v2.4/performance-best-practices/configuration.html#indexers

Quazz avatar Nov 09 '20 11:11 Quazz

There's also this: https://github.com/magento/magento2/pull/27129 to consider and potentially patch here.

Improves index times even further.

Quazz avatar Nov 12 '20 14:11 Quazz

Hi @Quazz , thank you for the update.

On our side, we've had good results so far on a MariaDB 10.3 with the following parameters :

in_predicate_conversion_threshold = 0
optimizer_switch = 'optimize_join_buffer_size=on'

romainruaud avatar Nov 23 '20 09:11 romainruaud

@romainruaud Thank you for those. I believe 'optimize_join_buffer_size=on' is default from 10.4.3 onwards now.

Quazz avatar Nov 23 '20 13:11 Quazz

Hi, we're also still facing to this problem with the following configuration:

  • Percona 5.7.33-36
  • Magento 2.3.4
  • elasticsuite 2.8.8

Problem

Profiling: getAttributesRawData take about 40sec image

EXPLAIN of the query shows missing index usage for EAV tables:

id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE entity NULL range CATALOG_PRODUCT_ENTITY_CREATED_IN,CATALOG_PRODUCT_ENTITY_UPDATED_IN,CATALOG_PRODUCT_ENTITY_ENTITY_ID_CREATED_IN_UPDATED_IN CATALOG_PRODUCT_ENTITY_ENTITY_ID_CREATED_IN_UPDATED_IN 12 NULL 323 100 Using where; Using index
1 SIMPLE attr NULL index PRIMARY EAV_ATTRIBUTE_PIM_ID 195 NULL 1017 42.38 Using where; Using index; Using join buffer (Block Nested Loop)
1 SIMPLE t_default NULL eq_ref CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_STORE_ID CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID 8 magento.entity.row_id,magento.attr.attribute_id,const 1 100 NULL
1 SIMPLE t_store NULL eq_ref CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_VARCHAR_STORE_ID CATALOG_PRODUCT_ENTITY_VARCHAR_ROW_ID_ATTRIBUTE_ID_STORE_ID 8 magento.entity.row_id,magento.attr.attribute_id,const 1 100 NULL

Possible solution

I don't found a performant variant of this query that also uses indexes, so I decide to split the query into two parts and join the result back in php which increase the speed.

After the patch getAttributesRawData take about 4sec image

I hope the following patch works like the intended behavior of the original implementation:

  • filter out null values
  • store values will override default values

The following patch is currently not tested in production:

Index: a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php b/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php
--- a/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php	(date 1615973628940)
+++ b/src/module-elasticsuite-catalog/Model/ResourceModel/Eav/Indexer/Fulltext/Datasource/AbstractAttributeData.php	(date 1615973628940)
@@ -16,6 +16,7 @@

 use Magento\Framework\App\ResourceConnection;
 use Magento\Framework\EntityManager\MetadataPool;
+use Magento\Store\Model\Store;
 use Magento\Store\Model\StoreManagerInterface;
 use Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Indexer;
 use Magento\Eav\Model\ResourceModel\Entity\Attribute\Collection as AttributeCollection;
@@ -104,49 +105,45 @@
      */
     public function getAttributesRawData($storeId, array $entityIds, $tableName, array $attributeIds)
     {
-        $select = $this->connection->select();
-
         // The field modelizing the link between entity table and attribute values table. Either row_id or entity_id.
         $linkField = $this->getEntityMetaData($this->getEntityTypeId())->getLinkField();

         // The legacy entity_id field.
         $entityIdField = $this->getEntityMetaData($this->getEntityTypeId())->getIdentifierField();

-        $joinDefaultValuesCondition = [
-            new \Zend_Db_Expr("entity.$linkField = t_default.$linkField"),
-            't_default.attribute_id = attr.attribute_id',
-            $this->connection->quoteInto('t_default.store_id = ?', \Magento\Store\Model\Store::DEFAULT_STORE_ID),
+        //Define store related conditions, keep the order of the array elements!
+        $storeConditions = [
+            'default' => $this->connection->quoteInto('t_attribute.store_id = ?', Store::DEFAULT_STORE_ID),
+            'store'   => $this->connection->quoteInto('t_attribute.store_id = ?', $storeId),
         ];
-        $joinDefaultValuesCondition = implode(' AND ', $joinDefaultValuesCondition);

-        $joinStoreValuesConditionClauses = [
-            new \Zend_Db_Expr("entity.$linkField = t_store.$linkField"),
-            't_store.attribute_id = attr.attribute_id',
-            $this->connection->quoteInto('t_store.store_id = ?', $storeId),
-        ];
-        $joinStoreValuesCondition = implode(' AND ', $joinStoreValuesConditionClauses);
+        $result = [];
+        foreach ($storeConditions as $condition) {
+            $joinAttributeValuesCondition = [
+                new \Zend_Db_Expr("entity.$linkField = t_attribute.$linkField"),
+                $condition
+            ];
+            $joinAttributeValuesCondition = implode(' AND ', $joinAttributeValuesCondition);

-        $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
-            ->joinInner(
-                ['attr' => $this->getTable('eav_attribute')],
-                $this->connection->quoteInto('attr.attribute_id IN (?)', $attributeIds),
-                ['attribute_id']
-            )
-            ->joinLeft(
-                ['t_default' => $tableName],
-                $joinDefaultValuesCondition,
-                []
-            )
-            ->joinLeft(
-                ['t_store' => $tableName],
-                $joinStoreValuesCondition,
-                []
-            )
-            ->where("entity.{$entityIdField} IN (?)", $entityIds)
-            ->having('value IS NOT NULL')
-            ->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
+            $select = $this->connection->select();
+            $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
+                ->joinLeft(
+                    ['t_attribute' => $tableName],
+                    $joinAttributeValuesCondition,
+                    ['attribute_id', 'value']
+                )
+                ->where("entity.{$entityIdField} IN (?)", $entityIds)
+                ->where("t_attribute.attribute_id IN (?)", $attributeIds)
+                ->where("t_attribute.value IS NOT NULL");

-        return $this->connection->fetchAll($select);
+            //Get the result and override values from a previous loop
+            foreach ($this->connection->fetchAll($select) as $row) {
+                $key = "{$row['entity_id']}-{$row['attribute_id']}";
+                $result[$key] = $row;
+            }
+        }
+
+        return array_values($result);
     }

     /**

Feel free to merge the patch into the core or leave it as optional patch for huge catalogs.

Wohlie avatar Mar 17 '21 10:03 Wohlie

About 15-25% improvement (for the full catalogsearch reindex) on a 125K ish catalog with 3 store views in a quick test. Can't fully speak to whether the results are the same, though the number of documents in the index seems to match at least.

Quazz avatar Mar 17 '21 15:03 Quazz

Wow, good catch, we'll have to test it with caution, especially with Magento Enterprise where things are handled differently (row_id/entity_id).

romainruaud avatar Apr 07 '21 08:04 romainruaud

@Wohlie did you test it in production ?

By the way, are you using Magento Commerce or Open Source ?

romainruaud avatar Apr 30 '21 07:04 romainruaud

Hi @romainruaud, we are using Magento Commerce 2.3.4. We are currently not live. Go live for this project is end of May. This patch is included since mid-March in our code base.

We don't test the patch for Magento open source but I re-use the original code for this patch. So the chance is high, it will also work with Magneto open source.

Wohlie avatar May 05 '21 19:05 Wohlie

@Wohlie

how time flies :)

Are you still using this patch on production ? If yes, that's probably enough tested now :)

Regards

romainruaud avatar Feb 06 '23 15:02 romainruaud

Hi @romainruaud, yes we still use this patch in production. :)

Wohlie avatar Feb 06 '23 19:02 Wohlie

Ok so this could probably be integrated into the core if this does help that much with performances.

I'll prioritize this.

Regards

romainruaud avatar Feb 06 '23 20:02 romainruaud