sphinx icon indicating copy to clipboard operation
sphinx copied to clipboard

[HTML search] Improve relevance scoring for titles and object-name matches.

Open jayaddison opened this issue 1 year ago • 2 comments

Feature or Bugfix

  • Bugfix

Purpose

  • Resolve some sub-optimal search result ranking/scoring reported by #12391.

Detail

  • [x] Add an example project to the JavaScript tests that exhibits the sub-optimal search result behaviour.
  • [x] Add JavaScript test coverage to assert on the expected, improved relative ordering of query results.
  • [x] Implement changes to the indexing/query algorithms to improve the query results without regressing other JavaScript search tests.
    • [x] Merged to fbb62cfda9600d2caa4d6f097745d59cea3a431b from #12393 (thanks @wlach!) and then added some suggested refactoring to that.

Relates

  • Resolves #12391.

jayaddison avatar Jun 19 '24 10:06 jayaddison

Commit cb0f6e7ffe5ec3afe8beebfeeba24b50451e7971 -- regenerating a search index file from scratch -- seems to be have been necessary because I had a stale _build directory on my local machine for the relevant input project for the fixture.

That seems like a bug; re-using an existing _build directory should be valid behaviour and should produce the same search index output as a fresh build. I'll file that as a separate issue within the next few days.

jayaddison avatar Jun 19 '24 21:06 jayaddison

I've a slight preference for #12047 to be merged before this, to make the code and diff history easier to follow, if+when either of them are considered ready.

jayaddison avatar Jun 24 '24 17:06 jayaddison

This should be ready for further review / merge; I've no changes planned on this branch.

jayaddison avatar Jul 08 '24 13:07 jayaddison

It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?

Yep, the thinking here was to replicate the asyncio relevance ordering problem using a minimal test case, and then to adjust the code to fix it; attempting to apply (and demonstrate) a Test-Driven-Development approach to search ranking fixups. I'll investigate expanding the test fixture data to add more results.

jayaddison avatar Jul 09 '24 10:07 jayaddison

It'd be good if we have a more complete example where you have a lot of multiple matches of the same kind. Does it cover the issue with the asyncio module that we described?

This PR uses a similar approach to what was described/shown in https://github.com/sphinx-doc/sphinx/pull/12393#issuecomment-2131351170 (edit: original link was wrong) so it should.

However, it would be good to another test before landing to be sure. I tested there by checking out the cpython repository and regenerating the Doc/ directory using a virtualenv with my development version of Sphinx.

wlach avatar Jul 09 '24 11:07 wlach

@jayaddison are you happy with this // ready to review & merge?

A

AA-Turner avatar Jul 10 '24 22:07 AA-Turner

@AA-Turner yep, I think this is ready.

jayaddison avatar Jul 11 '24 10:07 jayaddison

Thanks all!

A

AA-Turner avatar Jul 11 '24 10:07 AA-Turner

Thank you @AA-Turner!

jayaddison avatar Jul 11 '24 11:07 jayaddison