sphinx
sphinx copied to clipboard
[HTML search] Improve relevance scoring for titles and object-name matches.
Feature or Bugfix
- Bugfix
Purpose
- Resolve some sub-optimal search result ranking/scoring reported by #12391.
Detail
- [x] Add an example project to the JavaScript tests that exhibits the sub-optimal search result behaviour.
- [x] Add JavaScript test coverage to assert on the expected, improved relative ordering of query results.
- [x] Implement changes to the indexing/query algorithms to improve the query results without regressing other JavaScript search tests.
- [x] Merged to fbb62cfda9600d2caa4d6f097745d59cea3a431b from #12393 (thanks @wlach!) and then added some suggested refactoring to that.
Relates
- Resolves #12391.
Commit cb0f6e7ffe5ec3afe8beebfeeba24b50451e7971 -- regenerating a search index file from scratch -- seems to be have been necessary because I had a stale _build directory on my local machine for the relevant input project for the fixture.
That seems like a bug; re-using an existing _build directory should be valid behaviour and should produce the same search index output as a fresh build. I'll file that as a separate issue within the next few days.
I've a slight preference for #12047 to be merged before this, to make the code and diff history easier to follow, if+when either of them are considered ready.
This should be ready for further review / merge; I've no changes planned on this branch.
It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?
Yep, the thinking here was to replicate the asyncio relevance ordering problem using a minimal test case, and then to adjust the code to fix it; attempting to apply (and demonstrate) a Test-Driven-Development approach to search ranking fixups. I'll investigate expanding the test fixture data to add more results.
It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?
This PR uses a similar approach to what was described/shown in https://github.com/sphinx-doc/sphinx/pull/12393#issuecomment-2131351170 (edit: original link was wrong) so it should.
However, it would be good to another test before landing to be sure. I tested there by checking out the cpython repository and regenerating the Doc/ directory using a virtualenv with my development version of Sphinx.
@jayaddison are you happy with this // ready to review & merge?
A
@AA-Turner yep, I think this is ready.
Thanks all!
A
Thank you @AA-Turner!