git-scm.com icon indicating copy to clipboard operation
git-scm.com copied to clipboard

Searching "recurse-submodules" returns incomplete results

Open phil-blain opened this issue 6 years ago • 5 comments

  • [x] This is not an issue about
    • the Git documentation (a.k.a. man/help pages, i.e. anything with a URL starting with https://git-scm.com/docs), which should be raised with the community,
    • the contents of the Pro Git book (i.e. anything with a URL starting with https://git-scm.com/book or its PDF versions), which should be raised at progit/progit2.
    • Git itself, which should also be raised with the community, or
    • Git for Windows, which should be raised at git-for-windows/git.
    • and is actually about the website itself (e.g. website-specific content, JS, or CSS that doesn't seem to be doing its job correctly).

I was searching for all git commands that understand the --recurse-submodules flag, so I searched for "recurse-submodules" on the web site.

I don't know why, but the search results do not include neither the man page for git read-tree, nor the one for git switch which both support the flag: https://git-scm.com/docs/git-read-tree#Documentation/git-read-tree.txt---no-recurse-submodules,
https://git-scm.com/docs/git-switch#Documentation/git-switch.txt---recurse-submodules

I think these are the only ones that search doesn't return, since the search page has 10 hits under docs/ and doing

git grep -l  "recurse-submodules" -- Documentation/

in git.git returns 12 files.

So I guess something is wrong with the searching/indexing...

phil-blain avatar Sep 28 '19 20:09 phil-blain

I think part of the problem is that we don't re-index the manpages often enough (or really, automatically at all). I'm slightly hesitant to reindex them every night, since 99% of the time they don't change. It would be nice if the index job could tell when latest_version was the same and make the job a noop. Or maybe just accept the extra processing. It's not that expensive (it's on the order of 15 seconds of CPU).

I just reindexed, and now the git switch manpage shows up. Curiously read-tree doesn't seem to. We seem to cap the result at 10 items per source, though, with no option for pagination. So I'd guess that's the issue there (I didn't look in the code, but searching for something stupidly obvious like "Git" returns exactly 10 hits from the book and 10 hits from the manpages).

peff avatar Sep 28 '19 21:09 peff

Thanks for the quick check, and the reindexing. It would be good for cases like this to do pagination I guess. The website is usually the first place I go to read the man pages, so having search returning all matches would be ideal.

phil-blain avatar Oct 01 '19 12:10 phil-blain

So I guess https://github.com/git/git-scm.com/blob/2f81e0ce42cb616ce8dba9b3e76df93bfbbf9465/lib/searchable.rb#L20

is the culprit for capping at 10 results. I wanted to test it but could not get the search to work locally, is that normal ? The README does not mention anything special about that...

phil-blain avatar Oct 24 '19 03:10 phil-blain

search functionality uses elastic, so you need to have a local elastic instance running. I did that once (for #1282 ) , but I can't really remind of the details 🤔 I probably should have updated the docs for future situations like this one

pedrorijo91 avatar Oct 24 '19 07:10 pedrorijo91