Learn icon indicating copy to clipboard operation
Learn copied to clipboard

Filtering search results by language is broken

Open adamwoodnz opened this issue 1 year ago • 1 comments

Steps to reproduce:

  1. Navigate to an archive, eg. https://learn.wordpress.org/courses/?new-theme=1
  2. Enter a search term that you can see in the title or excerpt of one of the cards, eg. 'friendly' exists in the excerpt of the Beginner WordPress User course
  3. Submit the results and you should land at https://learn.wordpress.org/?s=friendly&post_type=course with results.
  4. Filter the results by language. At present Italian is the only option.

Expected: No results as all the courses from the search are in english

Actual: The same result set as without the filter

With search, no language With search, and language
Image Image

Note that filtering all courses by language does work https://learn.wordpress.org/courses/?language%5B%5D=it_IT&post_type=course

Image

adamwoodnz avatar Jun 27 '24 00:06 adamwoodnz

This is slightly complicated for a couple of reasons:

  • the indexes for search are all classified as en language, I believe you can create a language specific index, but unsure how that is done
  • language is stored in the site DB as a post_meta (language), but so you could potentially use this, but only if the meta is whitelisted in JP search so the request can be adjusted to include this

pkevan avatar Jul 02 '24 14:07 pkevan

I added language to the JP Search Post Meta whitelist, and triggered a partial sync. The sync validation now says that Post Meta is 100%.

This has still not resolved the issue however.

@kangzj do you have any insight on what more we need to do please?

Post meta Sync validation
Image Image

adamwoodnz avatar Jul 10 '24 03:07 adamwoodnz

We probably want to trigger a re-index - @trakos might know more.

kangzj avatar Jul 10 '24 03:07 kangzj

Hi, this is using the older Inline Search (sometimes also known as Jetpack Search Classic) instead of the Instant Search, correct?

Unlike plugins and patterns, there's no custom index for courses site, from what I can tell. As far as I can tell, we don't index metadata for regular classic search, even if it's synced. One option would be to change language meta field to an allow-listed taxonomy - we do index those for old search.

Another option would be to use our automatically-generated field lang, which is based on post content. It would require modifying the search query using a filter, here's a similar example: https://jetpack.com/support/search/inline-search/customize-inline-search/#filter-include. Instead of category.slug in that example, you would use lang. Variable $query should be your WP_Query object, so you could extract the language argument from that.

I'm not that familiar with the older Inline Search, since we haven't been working on it in my time. @gibrown if you would have time, I would appreciate double-checking if the options I suggested make sense here.

trakos avatar Jul 10 '24 04:07 trakos

Ya that is roughly right.

The lang field may be sufficient to do filtering, but I think you also need to use this filter to search against the correct language analyzer: https://github.com/Automattic/jetpack/blob/trunk/projects/packages/search/src/classic-search/class-classic-search.php#L865 We use one field for each language. e.g. content.en, content.it, content.de, etc. There is also content.default for languages that we don't have custom analyzers for. If you look at the query parser you'll see it does all this for you and so that filter should be sufficient.

You may still need a custom taxonomy though to limit the results to only those posts in a particular locale. Our language detection supports a fixed number of languages and we don't really have a different localization for each. Typically we're using the first two chars of the language: https://github.com/Automattic/jetpack/blob/trunk/projects/packages/search/src/classic-search/class-classic-search.php#L89 (or see that code in the query parser).

A custom taxonomy or category would be the easiest way to filter. The index this is running on (Classic Search) doesn't include any post_meta. The instant search index does include post meta, but (unfortunately) doesn't have this same connector that fully replaces WP_Query search queries. (That's really this issue: https://github.com/Automattic/jetpack/issues/25647)

gibrown avatar Jul 10 '24 23:07 gibrown

Thanks all!

adamwoodnz avatar Jul 10 '24 23:07 adamwoodnz

This might be a different issue, but even when after we add D154812-code, the indexing and querying would still be impossible for old search @trakos @gibrown?

kangzj avatar Jul 11 '24 01:07 kangzj

I believe that this one should work, as you're adding it to const taxonomies; custom taxonomies are indexed for old Search as well. Adding to const public_postmeta in the same file wouldn't work for old Search, though.

trakos avatar Jul 11 '24 02:07 trakos

A custom taxonomy or category would be the easiest way to filter. The index this is running on (Classic Search) doesn't include any post_meta. The instant search index does include post meta, but (unfortunately) doesn't have this same connector that fully replaces WP_Query search queries.

Ok thanks, it really looks like using post meta is going to be overly complicated, and your recommendations seem to pretty strongly point towards using a taxonomy. Fortunately the content in question here doesn't have a lot of language data yet so we can fairly easily migrate it from post meta to taxonomy. Would we need to get the new language taxonomy added to the index like @kangzj did for our level taxonomy in D154812-code?

adamwoodnz avatar Jul 23 '24 23:07 adamwoodnz

@adamwoodnz if you use a taxonomy that already exists, like language, then you we don't need to add anything. See the full list here: https://github.com/Automattic/jetpack/blob/36b2d5232e2ec3a1ad14034ab673fddce3acab97/projects/packages/sync/src/modules/class-search.php#L760

gibrown avatar Jul 24 '24 21:07 gibrown

@adamwoodnz if you use a taxonomy that already exists, like language, then you we don't need to add anything. See the full list here: https://github.com/Automattic/jetpack/blob/36b2d5232e2ec3a1ad14034ab673fddce3acab97/projects/packages/sync/src/modules/class-search.php#L760

Excellent, thanks!

adamwoodnz avatar Jul 24 '24 21:07 adamwoodnz