magento2
magento2 copied to clipboard
Report viewed products does not seem to work correctly
Preconditions and environment
- Magento version 2.4.6-p1
Steps to reproduce
-
At the section of the file "vendor/magento/module-customer/etc/di.xml" add your user agent at the section below:
<type name="Magento\Customer\Model\Visitor"> <arguments> <argument name="ignoredUserAgents" xsi:type="array"> <item name="google1" xsi:type="string">Googlebot/1.0 ([email protected] http://googlebot.com/)</item> <item name="google2" xsi:type="string">Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)</item> <item name="google3" xsi:type="string">Googlebot/2.1 (+http://www.googlebot.com/bot.html)</item> </argument> </arguments> </type>
-
Recompile Magento.
-
Generate the static files.
-
Clear the caches.
-
Visit a product page with your user agent.
Expected result
Since your user agent is configured to be ignored in the file "vendor/magento/module-customer/etc/di.xml", no record should be created in the table "report_viewed_product_index" for your visiting any product page of the store.
Actual result
A record is created in the table "report_viewed_product_index" for your visiting any product page of the store.
Additional information
- The user agents to be ignored should be matched as complete strings in the file "vendor/magento/module-customer/etc/di.xml". By default, the di.xml file contains 3 user agents associated with GoogleBot. But, the GoogleBot user agent contains the string "Chrome/W.X.Y.Z" which changes from time to time based on the version of the Chrome browser used by that very user agent; e.g.: "Chrome/41.0.2272.96". So, the log files of Apache or Nginx should be regularly checked in order to get new user agents associated with GoogleBot. The same happens with BingBot, and maybe with other bots. A relative match would be better, for example matching any user agent including the string "Googlebot/2.1".
- It is not clear how the visitor_id and customer_id fields of the table "report_viewed_product_index" are updated. When a logged-in customer views a product page, then the customer_id field gets a value, while the visitor_id field is NULL. When the customer logs out and view another product page, then the visitor_id field gets a value while the customer_id field is NULL. However, when the customer has never logged in while browsing the store, then both the visitor_id and customer_id fields get the NULL value.
Due to the above problems, the table "report_viewed_product_index":
- includes data coming from both real visitors and bots, while it is not possible to discriminate the data coming from bots, so as to at least truncate the relevant records. As a result the statistics are polluted by bots.
- can become really huge within a short time depending o.n the number of products.
- generates slow queries in the database when its size becomes large, as it is used in queries via INNER JOINs. A slow query causes a general performance issue on the database, as the involved tables stay open more time waiting for the query to get executed.
- is updated for every visit of product pages generating unnecessary work load on the database given that a bot can crawl thousands of pages within a day.
Release note
No response
Triage and priority
- [ ] Severity: S0 - Affects critical data or functionality and leaves users without workaround.
- [ ] Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
- [ ] Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
- [x] Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
- [ ] Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.
Hi @dandrikop. Thank you for your report. To speed up processing of this issue, make sure that the issue is reproducible on the vanilla Magento instance following Steps to reproduce. To deploy vanilla Magento instance on our environment, Add a comment to the issue:
-
@magento give me 2.4-develop instance
- upcoming 2.4.x release - For more details, review the Magento Contributor Assistant documentation.
- Add a comment to assign the issue:
@magento I am working on this
- To learn more about issue processing workflow, refer to the Code Contributions.
Join Magento Community Engineering Slack and ask your questions in #github channel. :warning: According to the Magento Contribution requirements, all issues must go through the Community Contributions Triage process. Community Contributions Triage is a public meeting. :clock10: You can find the schedule on the Magento Community Calendar page. :telephone_receiver: The triage of issues happens in the queue order. If you want to speed up the delivery of your contribution, join the Community Contributions Triage session to discuss the appropriate ticket.
@magento I am working on this
Hi @dandrikop! :wave:
Thank you for collaboration. Only members of Community Contributors Team are allowed to be assigned to the issue. Please use @magento add to contributors team
command to join Contributors team.
Hello,
I think that there are two solutions:
- Visits made by bots at product pages are not recorded in the table "report_viewed_product_index", at all.
- Visits made by bots at product pages are recorded in the table "report_viewed_product_index" in a way that they can be identified and deleted, if not needed.
In any case the user agents to be ignored must be definitely enriched, as there are many bots around the Internet besides Google's. If the administrator cannot be allowed to add user agents to be ignored, at least these should be matched via a relative match; for example matching any user agent including the string "Googlebot", "Bingbot" etc.
In addition, it would be useful if some directions for extending the built-in Magento Customer module can be provided as an example in the scope of enriching the list of user agents to be ignored for the table "report_viewed_product_index".
Hi @engcom-Dash. Thank you for working on this issue. In order to make sure that issue has enough information and ready for development, please read and check the following instruction: :point_down:
-
- Verify that issue has all the required information. (Preconditions, Steps to reproduce, Expected result, Actual result).
-
- Verify that issue has a meaningful description and provides enough information to reproduce the issue.
-
- Add
Area: XXXXX
label to the ticket, indicating the functional areas it may be related to.
- Add
-
- Verify that the issue is reproducible on
2.4-develop
branchDetails
- Add the comment@magento give me 2.4-develop instance
to deploy test instance on Magento infrastructure.
- If the issue is reproducible on2.4-develop
branch, please, add the labelReproduced on 2.4.x
.
- If the issue is not reproducible, add your comment that issue is not reproducible and close the issue and stop verification process here!
- Verify that the issue is reproducible on
- Join Magento Community Engineering Slack and ask your questions in #github channel.
Hi @dandrikop
Thanks for reporting and collaboration.
Verified the issue on magento dev instance and the issue is not reproducible.
When the user visits any product page of the store , there are no records created in the table "report_viewed_product_index" Since your user agent is configured to be ignored in the file "vendor/magento/module-customer/etc/di.xml". Please refer the screenshot attached.
Hello @engcom-Dash,
I provide below some details so as to make sure the testing:
- Verify that the "Product View" Report is enabled at STORES > Configuration > General > Reports > General Options > Enable "Product View" Report=Yes
- Visit a product page of the store, e.g. if product ID is 12412, then the product page is https://www.example.com/catalog/product/view/id/12412
- Check that an entry was created at the table "report_viewed_product_index" for the visited product ID 12412: SELECT * FROM
report_viewed_product_index
WHERE product_id=12412 - Find your browser's user agent. For example: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36
- Update the file "vendor/magento/module-customer/etc/di.xml" with your user agent as follows with the new entry named as "google4":
- Recompile Magento and generate the static files, refresh the caches.
- Visit another product page of the store, e.g. if product ID is 9675, then the product page is https://www.example.com/catalog/product/view/id/9675
- Check that an entry was NOT created at the table "report_viewed_product_index" for the visited product ID 9675, as your user agent should be ignored for the product view report: SELECT * FROM
report_viewed_product_index
WHERE product_id=9675
Hi @dandrikop
Thank you for reporting and collaboration.
Verified the issue in magento dev instance and the issue is reproducable.
An entry is getting created in "report_viewed_product_index" table on product visit though the user agent is configured to be ignored.
Please refer the attached screenshot.
:white_check_mark: Jira issue https://jira.corp.adobe.com/browse/AC-10814 is successfully created for this GitHub issue.
:white_check_mark: Confirmed by @engcom-Dash. Thank you for verifying the issue.
Issue Available: @engcom-Dash, You will be automatically unassigned. Contributors/Maintainers can claim this issue to continue. To reclaim and continue work, reassign the ticket to yourself.
:x: You don't have permission to export this issue.
Hello @engcom-Dash,
Thanks for identifying the issue.
Please consider in your Design for fixing this problem that the user agents to be ignored should be matched as relative strings in the file "vendor/magento/module-customer/etc/di.xml". This is because some user agents associated with bots have a part that is frequently changed. For example, the user agent of GoogleBot contains the string "Chrome/W.X.Y.Z" which changes from time to time based on the version of the Chrome browser used by that very bot; e.g.: "Chrome/41.0.2272.96". The same happens for the user agent of BingBot. I provide below the relevant documentation for both GoogleBot and BingBot which describe this:
https://developers.google.com/search/docs/crawling-indexing/overview-google-crawlers#googlebot-desktop https://www.bing.com/webmasters/help/which-crawlers-does-bing-use-8c184ec0
So, a relative match would be better. For example, matching any user agent including the string "Googlebot".
Besides the bots of search engines, there are also many other bots in the Internet which crawl websites in the same way search engines do. These bots are mostly associated with SEO crawlers which perform SEO assessment, or gather competitor data for their customers. So, it would be useful if some directions for extending the built-in Magento Customer module can be included in the documentation as an example in the scope of enriching the list of user agents to be ignored for the "Product View" Report (table "report_viewed_product_index").
Hello @engcom-Dash,
Is there any chance that a patch is issued for this issue?
I'm heavily counting on "Product View" Report as I extract the most popular products from it, and display them on the front.-end. I consider that the specific report has great value if the data are correct; i.e. not "polluted" by visits of Bots.