mdBook icon indicating copy to clipboard operation
mdBook copied to clipboard

Make print page (print.html) links link to anchors on the print page

Open HollowMan6 opened this issue 3 years ago • 16 comments
trafficstars

Resolves #1736

Let all the anchors id on the print page to have a path id prefix to help locate.

e.g. bar/foo.md#abc -> #bar-foo-abc

Also append a dummy div to the start of the original page to make sure that original page links without an anchor can also be located.

Signed-off-by: Hollow Man [email protected]

HollowMan6 avatar Feb 02 '22 13:02 HollowMan6

Tested on the following Rust Bookshelf books with the js code here checking broken links also unavailable anchors on print.html.

With this PR now all the links on the print page are self-contained, no broken links found except for those who are broken originally, or has self-made JavaScript that needs adaptation (ferris.js in Rust Programming Language).

Title Source Original Book Online Version
Cargo Book Source HTML
Edition Guide Source HTML
Embedded Rust Book Source HTML
Mdbook User Guide Source HTML
Rust Reference Source HTML
Rust By Example Source HTML
Rust Programming Language Source HTML
Rustc Book Source HTML
Rustdoc Book Source HTML
Rustonomicon Source HTML

Please review, Will appreciate if this PR can be merged.

cc: @ehuss

HollowMan6 avatar Feb 10 '22 01:02 HollowMan6

This was very helpful for me, it would be nice if this could be merged so i didn't have to build HollowMan6's fork myself in order to generate PDFs whose internal links work

dyaso avatar Jul 15 '22 23:07 dyaso

Hi @ehuss , I've noticed that you added the S-waiting-on-author label to this PR recently, but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify?

HollowMan6 avatar Oct 16 '22 14:10 HollowMan6

but I can't see any code change requests and even reviews for this PR. Am I missing something and what should I response/clarify?

I'm so sorry. I had a partial review that I wrote a long time ago, but never finished and didn't click the submit button. But GitHub displays the review on the conversation page as-if it was submitted (with a little icon that I overlooked), so I thought I had submitted it.

ehuss avatar Oct 17 '22 15:10 ehuss

This is just a partial review, there are some other things that I'd like to follow up on.

Done! Thanks for reviewing. Would love to see it get merged in the near future.

HollowMan6 avatar Oct 17 '22 17:10 HollowMan6

This was very helpful for me, it would be nice if this could be merged so i didn't have to build HollowMan6's fork myself in order to generate PDFs whose internal links work

Agreed - this would be really helpful to have in the main branch. Would greatly ease our CICD pipelines.

GeoffMurray avatar Feb 09 '23 02:02 GeoffMurray

Hi @ehuss ! Any update about this PR?

HollowMan6 avatar Apr 07 '23 13:04 HollowMan6

@ehuss Circling back here, recently started using mdBook and merging this would be very helpful.

Thanks in advance!

sjsadowski avatar May 09 '23 12:05 sjsadowski

Unfortunately this PR looks like it still needs a lot of work. I started a longer review, but I just don't currently have the time to work through all of them. There are two classes of issues. One is the code itself, there are various places that aren't written in an idiomatic Rust style. For example, let closure = |path: Option<&Path>| add_base(path); doesn't really make much sense as Rust code goes (it does nothing). The other class is correctness in meaning, where I started to notice some issues with how some of the links are rewritten.

Here is a diff of some of the cosmetic cleanup I started on but didn't finish:

diff --git a/src/renderer/html_handlebars/hbs_renderer.rs b/src/renderer/html_handlebars/hbs_renderer.rs
index 010e1dc2c..41866a4a1 100644
--- a/src/renderer/html_handlebars/hbs_renderer.rs
+++ b/src/renderer/html_handlebars/hbs_renderer.rs
@@ -57,7 +57,7 @@ impl HtmlHandlebars {
         let content = ch.content.clone();
         let content = utils::render_markdown(&content, ctx.html_config.curly_quotes);

-        let fixed_content = utils::render_markdown_with_path_and_redirects(
+        let printed_item = utils::render_markdown_with_path_and_redirects(
             &ch.content,
             ctx.html_config.curly_quotes,
             Some(path),
@@ -70,7 +70,7 @@ impl HtmlHandlebars {
             print_content
                 .push_str(r#"<div style="break-before: page; page-break-before: always;"></div>"#);
         }
-        let path_id = {
+        let print_page_id = {
             let mut base = path.display().to_string();
             if base.ends_with(".md") {
                 base.truncate(base.len() - 3);
@@ -84,10 +84,10 @@ impl HtmlHandlebars {
         // We have to build header links in advance so that we can know the ranges
         // for the headers in one page.
         // Insert a dummy div to make sure that we can locate the specific page.
-        print_content.push_str(&(format!(r#"<div id="{}"></div>"#, &path_id)));
+        print_content.push_str(&(format!(r#"<div id="{print_page_id}"></div>"#)));
         print_content.push_str(&build_header_links(
-            &build_print_element_id(&fixed_content, &path_id),
-            Some(path_id),
+            &build_print_element_id(&printed_item, &print_page_id),
+            Some(print_page_id),
         ));

         // Update the context with data for this file
@@ -212,8 +212,11 @@ impl HtmlHandlebars {
         Ok(())
     }

-    #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
-    fn post_process_print(
+    /// Applies some post-processing to the HTML to apply some adjustments.
+    ///
+    /// This common function is used for both normal chapters (via
+    /// `post_process`) and the combined print page.
+    fn post_process_common(
         &self,
         rendered: String,
         playground_config: &Playground,
@@ -225,7 +228,7 @@ impl HtmlHandlebars {
         rendered
     }

-    #[cfg_attr(feature = "cargo-clippy", allow(clippy::let_and_return))]
+    /// Applies some post-processing to the HTML to apply some adjustments.
     fn post_process(
         &self,
         rendered: String,
@@ -233,7 +236,7 @@ impl HtmlHandlebars {
         edition: Option<RustEdition>,
     ) -> String {
         let rendered = build_header_links(&rendered, None);
-        let rendered = self.post_process_print(rendered, &playground_config, edition);
+        let rendered = self.post_process_common(rendered, &playground_config, edition);

         rendered
     }
@@ -599,7 +602,7 @@ impl Renderer for HtmlHandlebars {
             let rendered = handlebars.render("index", &data)?;

             let rendered =
-                self.post_process_print(rendered, &html_config.playground, ctx.config.rust.edition);
+                self.post_process_common(rendered, &html_config.playground, ctx.config.rust.edition);

             utils::fs::write_file(destination, "print.html", rendered.as_bytes())?;
             debug!("Creating print.html ✓");
@@ -802,7 +805,7 @@ fn make_data(

 /// Go through the rendered print page HTML,
 /// add path id prefix to all the elements id as well as footnote links.
-fn build_print_element_id(html: &str, path_id: &str) -> String {
+fn build_print_element_id(html: &str, print_page_id: &str) -> String {
     static ALL_ID: Lazy<Regex> = Lazy::new(|| Regex::new(r#"(<[^>]*?id=")([^"]+?)""#).unwrap());
     static FOOTNOTE_ID: Lazy<Regex> = Lazy::new(|| {
         Regex::new(
@@ -812,19 +815,22 @@ fn build_print_element_id(html: &str, path_id: &str) -> String {
     });

     let temp_html = ALL_ID.replace_all(html, |caps: &Captures<'_>| {
-        format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+        format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
     });

     FOOTNOTE_ID
         .replace_all(&temp_html, |caps: &Captures<'_>| {
-            format!("{}{}-{}\"", &caps[1], path_id, &caps[2])
+            format!("{}{}-{}\"", &caps[1], print_page_id, &caps[2])
         })
         .into_owned()
 }

 /// Goes through the rendered HTML, making sure all header tags have
 /// an anchor respectively so people can link to sections directly.
-fn build_header_links(html: &str, path_id: Option<&str>) -> String {
+///
+/// `print_page_id` should be set to the print page ID prefix when adjusting the
+/// print page.
+fn build_header_links(html: &str, print_page_id: Option<&str>) -> String {
     static BUILD_HEADER_LINKS: Lazy<Regex> =
         Lazy::new(|| Regex::new(r"<h(\d)>(.*?)</h\d>").unwrap());

@@ -836,7 +842,7 @@ fn build_header_links(html: &str, path_id: Option<&str>) -> String {
                 .parse()
                 .expect("Regex should ensure we only ever get numbers here");

-            insert_link_into_header(level, &caps[2], &mut id_counter, path_id)
+            insert_link_into_header(level, &caps[2], &mut id_counter, print_page_id)
         })
         .into_owned()
 }
@@ -849,10 +855,11 @@ fn insert_link_into_header(
     level: usize,
     content: &str,
     id_counter: &mut HashMap<String, usize>,
-    path_id: Option<&str>,
+    print_page_id: Option<&str>,
 ) -> String {
-    let id = if let Some(path_id) = path_id {
-        utils::unique_id_from_content_with_path(content, id_counter, path_id)
+    let id = if let Some(print_page_id) = print_page_id {
+        let with_prefix = format!("{} {}", print_page_id, content);
+        utils::unique_id_from_content(&with_prefix, id_counter)
     } else {
         utils::unique_id_from_content(content, id_counter)
     };
diff --git a/src/utils/mod.rs b/src/utils/mod.rs
index a9e1298e9..cbf170b63 100644
--- a/src/utils/mod.rs
+++ b/src/utils/mod.rs
@@ -83,14 +83,6 @@ pub fn unique_id_from_content(content: &str, id_counter: &mut HashMap<String, us
     unique_id
 }

-pub(crate) fn unique_id_from_content_with_path(
-    content: &str,
-    id_counter: &mut HashMap<String, usize>,
-    path_id: &str,
-) -> String {
-    unique_id_from_content(&format!("{} {}", path_id, content), id_counter)
-}
-
 /// Improve the path to try remove and solve .. token,
 /// This assumes that `a/b/../c` is `a/c`.
 ///
@@ -136,13 +128,8 @@ fn normalize_path_id(mut path: String) -> String {
 ///
 /// This adjusts links, such as turning `.md` extensions to `.html`.
 ///
-/// `path` is the path to the page being rendered relative to the root of the
-/// book. This is used for the `print.html` page so that links on the print
-/// page go to the anchors that has a path id prefix. Normal page rendering
-/// sets `path` to None.
-///
-/// `redirects` is also only for the print page. It's for adjusting links to
-/// a redirected location to go to the correct spot on the `print.html` page.
+/// See [`render_markdown_with_path_and_redirects`] for a description of
+/// `path` and `redirects`.
 fn adjust_links<'a>(
     event: Event<'a>,
     path: Option<&Path>,
@@ -377,7 +364,16 @@ pub fn new_cmark_parser(text: &str, curly_quotes: bool) -> Parser<'_, '_> {
     Parser::new_ext(text, opts)
 }

-pub fn render_markdown_with_path_and_redirects(
+/// Renders markdown to HTML.
+///
+/// `path` is the path to the page being rendered relative to the root of the
+/// book. This is used for the `print.html` page so that links on the print
+/// page go to the anchors that has a path id prefix. Normal page rendering
+/// sets `path` to None.
+///
+/// `redirects` is also only for the print page. It's for adjusting links to
+/// a redirected location to go to the correct spot on the `print.html` page.
+pub(crate) fn render_markdown_with_path_and_redirects(
     text: &str,
     curly_quotes: bool,
     path: Option<&Path>,

If I find the time, I'll try to come back to this. Alternatively, if there is someone who is an experienced Rust developer who could help with the review here, that may help.

ehuss avatar May 09 '23 14:05 ehuss

@ehuss I wish I could help out, I'm mostly just consuming mdBook - my rust is amateurish at best. Thank you for the quick response, though, it helps with clarity on where things are!

sjsadowski avatar May 09 '23 14:05 sjsadowski

@ehuss Glad to know that this PR is actually been handled by the maintainer all the time! Feel free to commit code directly to this PR, I always keep "Allow edits by maintainers" on.

HollowMan6 avatar May 09 '23 16:05 HollowMan6

i'm using hallowman fork to create pdf files now. waiting for this branch to move to master.

tetsushiawano avatar Oct 29 '23 23:10 tetsushiawano

Hi, @ehuss! It took me some time to resolve the conflicts this time, so I hope this can get merged soon, any plans for continue reviewing?

HollowMan6 avatar Feb 07 '24 17:02 HollowMan6

Thanks. We will review it when we get the time so don't worry :)

Dylan-DPC avatar Feb 08 '24 13:02 Dylan-DPC

I checked the broken links again with the methods described at: https://github.com/rust-lang/mdBook/pull/1738#issuecomment-1034399451 and fixed the following issues:

  • Fix header links such as: (should be type-layout-the-rust-representation instead of type-layout--the-rust-representation) https://github.com/rust-lang/reference/blob/5be836c39a8f6a990ce5da17955cb53bac4b18a8/src/type-layout.md?plain=1#L162
  • Fix link to <a name="xxx"></a> (we should also add path id to those as well https://github.com/rust-lang/rustc-dev-guide/blob/af8e2fe2f8be94e84d6269563ad71b715eb962fe/src/variance.md?plain=1#L146)
  • Fix mailto links (shouldn't append path ID in this case)

Now no other broken links are found with the current version.

HollowMan6 avatar Feb 08 '24 17:02 HollowMan6