git-scm.com
git-scm.com copied to clipboard
Broken link to a ProGit book page presented via search
Searching for keyword presented a result which turned out to be a broken link.
URL for broken page
https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine
Problem
When I searched for "intelliJ" in the search bar, I got the result for "Appendix A: Git in Other Environments - Git in IntelliJ / PyCharm / WebStorm / PhpStorm / RubyMine" section of the ProGit book. When I clicked on it though, it resulted in a 404 for the following URL:
https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine
I guess the culprit is that the URL should be encoded but it isn't. When I try to access the section by going to https://git-scm.com/book/en/v2 and clicking on the link to the same section, it works fine.
Operating system and browser
Firefox on Windows 10
Steps to reproduce
- Open https://git-scm.com
- Search for "intellij"
- Click on the result under the "Book" category.
thanks for reporting the issue @sivaraam . would you be interested in helping to solve the issue?
Thanks for asking @pedrorijo91 ! I would love to but I lack the knowledge (ruby) required to fix this issue myself. I hope someone else would be able to take up the task of fixing this issue 🙂
I would like to work on this please, but I need a bit if guidance. I reproduced the issue but there based on the URL there should be a https://git-scm.com/book/en in the repo and I am unable to find this folder in this repo. Can some one please give me a hint where to find the git-scm.com/book/en/ document folder?
sure @C-Lion !
so, the book routing is made through https://github.com/git/git-scm.com/blob/main/app/controllers/books_controller.rb#L6
if we look into the setup at https://github.com/git/git-scm.com/blob/main/README.md#setup, we'll see how is the book content imported - it uses the rake remote_genbook2 command
now we need to dig into https://github.com/git/git-scm.com/blob/main/lib/tasks/book2.rake#L32 to find where are we adding the chapter title to the search index, and make sure we URL encode it
I suspect the problem is actually in the search code, not the book importer. In the model for a book section, we index the content and provide an id field: https://github.com/git/git-scm.com/blob/688a6c9ecca89201093f78d7b96d9f3acf54bc2d/app/models/section.rb#L80-L89
And then when we do a search, the section model inherits from Searchable, which formats the results: https://github.com/git/git-scm.com/blob/688a6c9ecca89201093f78d7b96d9f3acf54bc2d/lib/searchable.rb#L47-L62
So one of those probably needs to be URL-encoding things (you can see in the section model that we URL-encode the slug in other contexts when generating links). I'm not sure which place makes more sense. It doesn't look like we use that id for anything else, but maybe there are rules we need to follow for ElasticSearch (otherwise why would it do that weird --- replacement and not just insert a slash).
I think I understand the issue but it is not clear from your information which of the above needs to be corrected or what exactly you want changed. Seems like a lot of code just to display a URL. So just confirming this is not just a "correct the URL on the page issue"? I have basic Ruby skills & will try the correct rake command to see if I can get this set up locally.
So just confirming this is not just a "correct the URL on the page issue"?
Right. The problem is that the URL is not found in this repository at all. It is in content which is imported from another repository (the book code). Hence the complexity. A cron job pulls in updated book content into the sql database nightly, and then we run a search index on that content, putting the result into the elasticsearch database. And then incoming search requests query the elasticsearch database. So if we are going to add a layer of quoting to the URL, it needs to happen either when we index the content and stick it into elasticsearch, or when we pull it out and return it to the browser.
Searching for keyword presented a result which turned out to be a broken link.
URL for broken page
https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMine
Problem
When I searched for "intelliJ" in the search bar, I got the result for "Appendix A: Git in Other Environments - Git in IntelliJ / PyCharm / WebStorm / PhpStorm / RubyMine" section of the ProGit book. When I clicked on it though, it resulted in a 404 for the following URL:
https://git-scm.com/book/en/Appendix-A:-Git-in-Other-Environments-Git-in-IntelliJ-/-PyCharm-/-WebStorm-/-PhpStorm-/-RubyMineI guess the culprit is that the URL should be encoded but it isn't. When I try to access the section by going to https://git-scm.com/book/en/v2 and clicking on the link to the same section, it works fine.
Operating system and browser
Firefox on Windows 10
Steps to reproduce
- Open https://git-scm.com
- Search for "intellij"
- Click on the result under the "Book" category.