mdBook Only load searchindex when needed

First commit comes from https://github.com/rust-lang/mdBook/pull/2552.

This PR makes it so that the searchindex.js file is only loaded when the user arrives on a page with ?search= in the URL or if the user opens the search.

cc @notriddle

Feb 20 '25 22:02 GuillaumeGomez

If you're going to mess with the search engine, can you also write some gui tests that exercise it?

Feb 20 '25 22:02 notriddle

Very good point.

Feb 20 '25 23:02 GuillaumeGomez

Added the GUI test. :)

Feb 20 '25 23:02 GuillaumeGomez

When the JS is being retrieved, there is now a spinner and the input is not disabled anymore:

Feb 21 '25 17:02 GuillaumeGomez

Rebased and fixed merge conflict.

Mar 10 '25 12:03 GuillaumeGomez

It looks like you reverted the loading throbber, and now it disables the search input while it's loading again.

https://github.com/rust-lang/mdBook/commit/16cbfd6cfd766cdb3b981696f6b20c1b96b4e901 added it, but the pull request doesn't have it any more.

Mar 16 '25 04:03 notriddle

Arg indeed. Rebase went wrong I guess. Great catch, thanks!

Mar 16 '25 21:03 GuillaumeGomez

Everything seems good here!

Mar 21 '25 01:03 notriddle

Rebased.

Mar 23 '25 09:03 GuillaumeGomez

Re-rebased and also ran eslint.

Mar 23 '25 20:03 GuillaumeGomez

Rebased. If you could take a look @ehuss. If you need help to understand what this PR is doing, don't hesitate to ask!

Mar 31 '25 19:03 GuillaumeGomez

Can you say more about the reasoning behind this change? For me, it comes off as a worse experience, so we are somehow not seeing the same thing. For example, after opening a book and looking at it, I may want to search for something. Today this loads instantly (since it is eagerly loaded), but with this change I may need to wait 10+ seconds for anything to happen. Is this trying to save bandwidth? Or is it some performance issue?

I'm not completely against this kind of change, but the description here doesn't explain the motivation or acknowledge the potential downsides.

Mar 31 '25 20:03 ehuss

Today this loads instantly (since it is eagerly loaded), but with this change I may need to wait 10+ seconds for anything to happen. Is this trying to save bandwidth? Or is it some performance issue?

I'm very surprised. What book do you have this 10+ seconds wait?

The whole point of this PR is to actually speed up the page load and reduce the (page) size by default until you actually need the search. This is becomes more and more useful as the book size (and its search index) grows. This is how we allow rustdoc to always load extremely quickly, whatever the size of the crate (and of its search index).

Mar 31 '25 20:03 GuillaumeGomez

The RFCs book has a 22MB index. I suppose 10s might be a bit of an exaggeration for many people, as I was just doing some throttling tests, though I think we should be sensitive to people without fast internet.

Can you help me understand how this speeds up page load? My understanding is that the index is loaded async in the background. My expectation is that since it is loading in the background, it shouldn't have any measurable effect on initial render time. What little page-load profiling I've done doesn't show a difference with this PR.

Apr 02 '25 18:04 ehuss

It's mostly for people with limited internet access (not in bandwidth but in data) that will get their life improved with this PR: it only loads extra content on demand.

Can you help me understand how this speeds up page load? My understanding is that the index is loaded async in the background. My expectation is that since it is loading in the background, it shouldn't have any measurable effect on initial render time. What little page-load profiling I've done doesn't show a difference with this PR.

I wasn't clear, my bad. The load time should normally be close to not impacted (except I forgot to convert the JS to JSON.parse(), meaning that for big search indexes, all pages will have an impact when the JS will be parsed, because JS parsing is performed in the main thread, should be fixed in https://github.com/rust-lang/mdBook/pull/2633). Here, the impact is to reduce memory usage for the tab (until you actually need the search) and saving data.

Apr 02 '25 19:04 GuillaumeGomez

Ah I found a good analogy: with the RFC book, instead of loading the 22MB search index on all pages whether you need or not, you will only load it when you want to perform a search.

Apr 02 '25 19:04 GuillaumeGomez

:umbrella: The latest upstream changes (possibly #2681) made this pull request unmergeable. Please resolve the merge conflicts.

Apr 30 '25 13:04 rustbot

I'm still a little concerned about the reduced responsiveness of the search box, though not enough to block this change. For example, given the RFC repo's 22MB search index:

Speed	Time
5mbps	37s
40mbps	4.6s
100mbps	1.8s
500mbps	0.37s
1000mbps	0.18s

I don't know what would be a typical network connection speed we should be aiming for. I think a 1 or 2s delay for slower connections although unfortunate, should be fine. And the delay should only happen on the first search while it stays cached.

Can you rebase and resolve the conflict, and then we can merge?

May 30 '25 17:05 ehuss

There's a reason I was so negative on disabling the search field: you should be able to type your query in while the search index is loading, so you don't actually have to wait.

(at least, assuming it takes a couple seconds to type the search query in, and assuming you're not on the 5Mbps link)

May 30 '25 20:05 notriddle

We should also work on reducing the size of the search index.

May 31 '25 07:05 GuillaumeGomez

Fixed merge conflict. :)

May 31 '25 08:05 GuillaumeGomez