Searching "recurse-submodules" returns incomplete results
- [x] This is not an issue about
- the Git documentation (a.k.a. man/help pages, i.e. anything with a URL starting with
https://git-scm.com/docs), which should be raised with the community, - the contents of the Pro Git book (i.e. anything with a URL starting with
https://git-scm.com/bookor its PDF versions), which should be raised at progit/progit2. - Git itself, which should also be raised with the community, or
- Git for Windows, which should be raised at git-for-windows/git.
- and is actually about the website itself (e.g. website-specific content, JS, or CSS that doesn't seem to be doing its job correctly).
- the Git documentation (a.k.a. man/help pages, i.e. anything with a URL starting with
I was searching for all git commands that understand the --recurse-submodules flag, so I searched for "recurse-submodules" on the web site.
I don't know why, but the search results do not include neither the man page for git read-tree, nor the one for git switch which both support the flag:
https://git-scm.com/docs/git-read-tree#Documentation/git-read-tree.txt---no-recurse-submodules,
https://git-scm.com/docs/git-switch#Documentation/git-switch.txt---recurse-submodules
I think these are the only ones that search doesn't return, since the search page has 10 hits under docs/ and doing
git grep -l "recurse-submodules" -- Documentation/
in git.git returns 12 files.
So I guess something is wrong with the searching/indexing...
I think part of the problem is that we don't re-index the manpages often enough (or really, automatically at all). I'm slightly hesitant to reindex them every night, since 99% of the time they don't change. It would be nice if the index job could tell when latest_version was the same and make the job a noop. Or maybe just accept the extra processing. It's not that expensive (it's on the order of 15 seconds of CPU).
I just reindexed, and now the git switch manpage shows up. Curiously read-tree doesn't seem to. We seem to cap the result at 10 items per source, though, with no option for pagination. So I'd guess that's the issue there (I didn't look in the code, but searching for something stupidly obvious like "Git" returns exactly 10 hits from the book and 10 hits from the manpages).
Thanks for the quick check, and the reindexing. It would be good for cases like this to do pagination I guess. The website is usually the first place I go to read the man pages, so having search returning all matches would be ideal.
So I guess https://github.com/git/git-scm.com/blob/2f81e0ce42cb616ce8dba9b3e76df93bfbbf9465/lib/searchable.rb#L20
is the culprit for capping at 10 results. I wanted to test it but could not get the search to work locally, is that normal ? The README does not mention anything special about that...
search functionality uses elastic, so you need to have a local elastic instance running. I did that once (for #1282 ) , but I can't really remind of the details 🤔 I probably should have updated the docs for future situations like this one