Filtering search results by language is broken
Steps to reproduce:
- Navigate to an archive, eg. https://learn.wordpress.org/courses/?new-theme=1
- Enter a search term that you can see in the title or excerpt of one of the cards, eg. 'friendly' exists in the excerpt of the Beginner WordPress User course
- Submit the results and you should land at https://learn.wordpress.org/?s=friendly&post_type=course with results.
- Filter the results by language. At present Italian is the only option.
Expected: No results as all the courses from the search are in english
Actual: The same result set as without the filter
| With search, no language | With search, and language |
|---|---|
Note that filtering all courses by language does work https://learn.wordpress.org/courses/?language%5B%5D=it_IT&post_type=course
This is slightly complicated for a couple of reasons:
- the indexes for search are all classified as
enlanguage, I believe you can create a language specific index, but unsure how that is done - language is stored in the site DB as a post_meta (language), but so you could potentially use this, but only if the meta is whitelisted in JP search so the request can be adjusted to include this
I added language to the JP Search Post Meta whitelist, and triggered a partial sync. The sync validation now says that Post Meta is 100%.
This has still not resolved the issue however.
@kangzj do you have any insight on what more we need to do please?
| Post meta | Sync validation |
|---|---|
We probably want to trigger a re-index - @trakos might know more.
Hi, this is using the older Inline Search (sometimes also known as Jetpack Search Classic) instead of the Instant Search, correct?
Unlike plugins and patterns, there's no custom index for courses site, from what I can tell. As far as I can tell, we don't index metadata for regular classic search, even if it's synced. One option would be to change language meta field to an allow-listed taxonomy - we do index those for old search.
Another option would be to use our automatically-generated field lang, which is based on post content. It would require modifying the search query using a filter, here's a similar example: https://jetpack.com/support/search/inline-search/customize-inline-search/#filter-include. Instead of category.slug in that example, you would use lang. Variable $query should be your WP_Query object, so you could extract the language argument from that.
I'm not that familiar with the older Inline Search, since we haven't been working on it in my time. @gibrown if you would have time, I would appreciate double-checking if the options I suggested make sense here.
Ya that is roughly right.
The lang field may be sufficient to do filtering, but I think you also need to use this filter to search against the correct language analyzer: https://github.com/Automattic/jetpack/blob/trunk/projects/packages/search/src/classic-search/class-classic-search.php#L865 We use one field for each language. e.g. content.en, content.it, content.de, etc. There is also content.default for languages that we don't have custom analyzers for. If you look at the query parser you'll see it does all this for you and so that filter should be sufficient.
You may still need a custom taxonomy though to limit the results to only those posts in a particular locale. Our language detection supports a fixed number of languages and we don't really have a different localization for each. Typically we're using the first two chars of the language: https://github.com/Automattic/jetpack/blob/trunk/projects/packages/search/src/classic-search/class-classic-search.php#L89 (or see that code in the query parser).
A custom taxonomy or category would be the easiest way to filter. The index this is running on (Classic Search) doesn't include any post_meta. The instant search index does include post meta, but (unfortunately) doesn't have this same connector that fully replaces WP_Query search queries. (That's really this issue: https://github.com/Automattic/jetpack/issues/25647)
Thanks all!
This might be a different issue, but even when after we add D154812-code, the indexing and querying would still be impossible for old search @trakos @gibrown?
I believe that this one should work, as you're adding it to const taxonomies; custom taxonomies are indexed for old Search as well. Adding to const public_postmeta in the same file wouldn't work for old Search, though.
A custom taxonomy or category would be the easiest way to filter. The index this is running on (Classic Search) doesn't include any post_meta. The instant search index does include post meta, but (unfortunately) doesn't have this same connector that fully replaces WP_Query search queries.
Ok thanks, it really looks like using post meta is going to be overly complicated, and your recommendations seem to pretty strongly point towards using a taxonomy. Fortunately the content in question here doesn't have a lot of language data yet so we can fairly easily migrate it from post meta to taxonomy. Would we need to get the new language taxonomy added to the index like @kangzj did for our level taxonomy in D154812-code?
@adamwoodnz if you use a taxonomy that already exists, like language, then you we don't need to add anything. See the full list here: https://github.com/Automattic/jetpack/blob/36b2d5232e2ec3a1ad14034ab673fddce3acab97/projects/packages/sync/src/modules/class-search.php#L760
@adamwoodnz if you use a taxonomy that already exists, like
language, then you we don't need to add anything. See the full list here: https://github.com/Automattic/jetpack/blob/36b2d5232e2ec3a1ad14034ab673fddce3acab97/projects/packages/sync/src/modules/class-search.php#L760
Excellent, thanks!