docs icon indicating copy to clipboard operation
docs copied to clipboard

Algolia DocSearch fine-tuning

Open matks opened this issue 3 years ago • 17 comments

We now run the amazing search provided by https://docsearch.algolia.com/ !

We can improve the search results, here is a todo-list

  • verify h1, h2, ... structure https://docsearch.algolia.com/docs/tips
  • add a static class DocSearch-content to the main container https://docsearch.algolia.com/docs/required-configuration#use-the-right-classes-as-selectors
  • add meta tags https://docsearch.algolia.com/docs/required-configuration#introduce-global-information-as-meta-tags
  • index hook names

Sources

https://docsearch.algolia.com/docs/tips

https://www.algolia.com/blog/engineering/how-to-build-a-helpful-search-for-technical-documentation-the-laravel-example/

https://docsearch.algolia.com/docs/required-configuration

matks avatar Jun 25 '21 12:06 matks

Source of our configuration https://github.com/algolia/docsearch-configs/blob/master/configs/prestashop.json (if we aim to improve it) => we can contribute to repository

matks avatar Jun 25 '21 12:06 matks

Additionnal documentation:

  • https://www.algolia.com/doc/guides/managing-results/relevance-overview/
  • https://www.algolia.com/doc/guides/managing-results/must-do/searchable-attributes/
  • https://www.algolia.com/doc/guides/managing-results/refine-results/faceting/

matks avatar Jul 28 '21 12:07 matks

@eternoendless did you work on this ?

matks avatar Sep 14 '21 12:09 matks

Search results are filtered by version since this PR https://github.com/PrestaShop/ps-docs-theme/pull/5

eternoendless avatar Sep 14 '21 15:09 eternoendless

Right now, there is a massive problem with the indexation of different versions of the docs. Here are the results:

For instance, these are the results for Console in v1.7: https://capture.dropbox.com/XEthfVFiNbVsrOno results are acceptable.

Results for v8: https://capture.dropbox.com/vw4QjHWhichgKj0y

which is not relevant for the user.

This problem is also visible in the Algolia administration center as we can see indexation for v8 is much lower than for 1.7: https://capture.dropbox.com/aEoLpz0rrq5wtoSk

kpodemski avatar Sep 08 '22 08:09 kpodemski

I don't understand why, we're following their documentation and pages are correctly tagged 🤔

eternoendless avatar Sep 26 '22 17:09 eternoendless

I take this one ^^

MeKeyCool avatar Sep 27 '22 07:09 MeKeyCool

Right now, there is a massive problem with the indexation of different versions of the docs. Here are the results:

For instance, these are the results for Console in v1.7: https://capture.dropbox.com/XEthfVFiNbVsrOno results are acceptable.

Results for v8: https://capture.dropbox.com/vw4QjHWhichgKj0y

which is not relevant for the user.

This problem is also visible in the Algolia administration center as we can see indexation for v8 is much lower than for 1.7: https://capture.dropbox.com/aEoLpz0rrq5wtoSk

Taking most used research from Algolia analytics, I found that they are all broken in v8 search. Even for documentations that didn't move/change.

search count nbHits totalPercent Check v8 doc search
hook 592 257 0.024% NOk (satisfying in v1.7)
hooks 287 193 0.012% NOk (satisfying in v1.7)
form 226 1015 0.009% NOk (satisfying in v1.7)
override 217 93 0.009% NOk (satisfying in v1.7)
ajax 206 27 0.008% NOk (satisfying in v1.7)
product 167 394 0.007% NOk (satisfying in v1.7)
grid 133 182 0.005% NOk (satisfying in v1.7)
order 132 332 0.005% NOk (satisfying in v1.7)
controller 111 228 0.004% NOk (satisfying in v1.7)
cart 111 440 0.004% NOk (satisfying in v1.7)
cache 109 44 0.004% NOk (satisfying in v1.7)
mail 106 269 0.004% NOk (satisfying in v1.7)
smarty 97 81 0.004% NOk (satisfying in v1.7)
cron 93 956 0.003% NOk (satisfying in v1.7)

I'll check configuration for v8 scoped search.

MeKeyCool avatar Oct 03 '22 09:10 MeKeyCool

It seems the crawler is stopped :

Too many missing records The new index generated by this crawl is missing too many records to replace the production index automatically

SafeReindexingError: [prestashop] Blocking error:
   The difference between the number of records:
   from : 12.2k
   to   : 0
   is too large (100 %), this limit can be modified in the Editor (currently 10 %)

MeKeyCool avatar Oct 03 '22 10:10 MeKeyCool

Thank you @MeKeyCool that is worrying news 😱

matks avatar Oct 03 '22 11:10 matks

I think it is from hostname devdocs.prestashop-project.org update. I'll check to make a PR as soon as possible.

MeKeyCool avatar Oct 03 '22 13:10 MeKeyCool

@MeKeyCool

there's also a problem with heading prioritization, take a look at this page: https://devdocs.prestashop-project.org/1.7/modules/concepts/hooks/list-of-hooks/#full-list

the page with "list of hooks" in h1 should haver a higher priority if you search for list of hooks

kpodemski avatar Oct 03 '22 14:10 kpodemski

@MeKeyCool

there's also a problem with heading prioritization, take a look at this page: https://devdocs.prestashop-project.org/1.7/modules/concepts/hooks/list-of-hooks/#full-list

the page with "list of hooks" in h1 should haver a higher priority if you search for list of hooks

:+1: I'll add this to my testing process. If someone can update issue description, it would be good to store all "testing" criteria in a complete and synthetic description ^^

MeKeyCool avatar Oct 03 '22 14:10 MeKeyCool

I sent an email to Algolia to update domain, @kpodemski is linked so I hope he will be informed. As I won't be able to follow this subject anymore, I recommend to change owner.

:point_up: Please notice that once the domain will be allowed by admin, you will need to update crawler configuration https://crawler.algolia.com/admin/crawlers/0b7a25f0-3983-498e-8d7b-38e003a8184d/configuration/edit

MeKeyCool avatar Oct 06 '22 07:10 MeKeyCool

Algolia answered to me and I updated crawler's configuration.

It works but it looks it didn't solve our problem. Studying a bit more crawler results, it looks that around half of URLs are ignored : https://crawler.algolia.com/admin/crawlers/0b7a25f0-3983-498e-8d7b-38e003a8184d/monitoring/summary

It had time to check one of them and it is said "https://devdocs.prestashop-project.org/8/basics/installation/configuration/ ... Skipped in favor of canonical URL: https://devdocs.prestashop-project.org/1.7/basics/installation/configuration/"

As @kpodemski suggested, it is probably a conflict between 1.7 and 8 versions inside Algolia

MeKeyCool avatar Oct 06 '22 23:10 MeKeyCool

Thank you, @MeKeyCool, for your work around this subject! :)

Yes, that's what I thought. Thanks to you, we know it is because of the canonical URLs. Of course, having a canonical URL makes sense. It is coming from the @eternoendless PR: https://github.com/PrestaShop/ps-docs-theme/commit/7f24eee5d1eb25819ad7987f0feff04ff0513fd1

I've submitted a change to index pages with the canonical URL: https://github.com/PrestaShop/ps-docs-theme/pull/13

kpodemski avatar Oct 07 '22 07:10 kpodemski

Update:

It was required to change the Crawler settings and set a parameter ignoreCanonicalTo to true

now indexation is working as expected

There's still some work to do to improve the quality of the results, but having some results in general for v8 is a good starting point 😅

Thanks again @MeKeyCool, for fixing the crawler, which helped us find a final solution

kpodemski avatar Oct 07 '22 10:10 kpodemski