Documenter.jl icon indicating copy to clipboard operation
Documenter.jl copied to clipboard

Many locations are listed repeatedly in the search index

Open LilithHafner opened this issue 5 months ago • 2 comments

Examining the search index on https://docs.julialang.org/en/v1.11-dev/#, I noticed that many items are listed multiple times in the search index under the same category, location, page, and title (though with different text).

d = documenterSearchIndex.docs; d.length
10083
all_but_text = d.map(function f(dd) {return dd.category + dd.location + dd.page + dd.title;}); all_but_text.length
10083
new Set(all_but_text).size
3854
all_incl_text = d.map(function f(dd) {return dd.category + dd.location + dd.page + dd.title + dd.text;}); all_incl_text.length
10083
new Set(all_incl_text).size
10046

I imagine that aggregation at index-creation time will improve runtime performance slightly without much alteration to result ordering.

One way to aggregate these semi-duplicates is to concatenate their texts.

LilithHafner avatar Jan 20 '24 18:01 LilithHafner