mdBook icon indicating copy to clipboard operation
mdBook copied to clipboard

Fix relative links to README.md files

Open chrstnst opened this issue 2 years ago • 10 comments

Fixes #1920

I suggest adding to the 'index' preprocessor a call to swap the filenames in links in the chapters' contents.

chrstnst avatar Nov 08 '22 05:11 chrstnst

Fixes #984

chrstnst avatar Nov 08 '22 16:11 chrstnst

@ehuss Is this change easy to review and accept?

sanmai-NL avatar Dec 19 '22 13:12 sanmai-NL

I would prefer to not use regular expressions to translate the links. Can the translation be done in adjust_links or somewhere like that?

ehuss avatar Jan 16 '23 15:01 ehuss

Would love to see this merged.

CaydenPierce avatar Feb 03 '23 17:02 CaydenPierce

@CaydenPierce would you be able to make the requested improvement to the original PR, using a more robust/performant matching algorithm than regular expression matching?

sanmai-NL avatar Feb 03 '23 18:02 sanmai-NL

Hi, I've been reading the discussions on this topic and I'm also interested in seeing mdBook incorporate a solution for this issue.

Can the translation be done in adjust_links or somewhere like that?

I had a look and came up with something that extends the existing regular expression in adjust_links. However this would apply to all links, even if the 'index' preprocessor has been disabled. I'm sharing it in case this can help moving forward on this.

diff --git a/src/utils/mod.rs b/src/utils/mod.rs
index 9156916..d0cda17 100644
--- a/src/utils/mod.rs
+++ b/src/utils/mod.rs
@@ -95,7 +95,7 @@ pub fn unique_id_from_content(content: &str, id_counter: &mut HashMap<String, us
 fn adjust_links<'a>(event: Event<'a>, path: Option<&Path>) -> Event<'a> {
     static SCHEME_LINK: Lazy<Regex> = Lazy::new(|| Regex::new(r"^[a-z][a-z0-9+.-]*:").unwrap());
     static MD_LINK: Lazy<Regex> =
-        Lazy::new(|| Regex::new(r"(?P<link>.*)\.md(?P<anchor>#.*)?").unwrap());
+        Lazy::new(|| Regex::new(r"(?P<link>.*?)(?P<readme>README)?\.md(?P<anchor>#.*)?").unwrap());
 
     fn fix<'a>(dest: CowStr<'a>, path: Option<&Path>) -> CowStr<'a> {
         if dest.starts_with('#') {
@@ -126,7 +126,16 @@ fn adjust_links<'a>(event: Event<'a>, path: Option<&Path>) -> Event<'a> {
             }
 
             if let Some(caps) = MD_LINK.captures(&dest) {
-                fixed_link.push_str(&caps["link"]);
+                let link = &caps["link"];
+                fixed_link.push_str(link);
+                // "Links to README.md will be converted to index.html"
+                if let Some(readme) = caps.name("readme") {
+                    if link.is_empty() || link.ends_with('/') {
+                        fixed_link.push_str("index");
+                    } else {
+                        fixed_link.push_str(readme.as_str());
+                    }
+                }
                 fixed_link.push_str(".html");
                 if let Some(anchor) = caps.name("anchor") {
                     fixed_link.push_str(anchor.as_str());

Several questions I can think of:

  1. How to deal with the fact that the 'index' preprocessor can be run or not? Maybe the logic should live in src/preprocess/index.rs and make calls to adjust_links.
  2. "I would prefer to not use regular expressions to translate the links.": Is it ok to extend the regex in adjust_links?

ghost avatar Jun 30 '23 10:06 ghost

Maybe README.md could be mapped to index.html AND README.html. This way the links don't need to be updated. Just an alternative as changing the links seems to be complicated. I don't believe that having README.md should be a problem for users. After all, users can have either paths ending in / or in /index.html, so why not /README.html as well? It might not look as cool as having URLs end in / but it is simple and robust.

bitdivine avatar Sep 02 '23 07:09 bitdivine

Maybe README.md could be mapped to index.html AND README.html. This way the links don't need to be updated. Just an alternative as changing the links seems to be complicated. I don't believe that having README.md should be a problem for users. After all, users can have either paths ending in / or in /index.html, so why not /README.html as well? It might not look as cool as having URLs end in / but it is simple and robust.

That would work great, but the canonical address should be recorded in the HTML to prevent duplicate content for indexers like online search engines.

sanmai-NL avatar Sep 02 '23 09:09 sanmai-NL

Looking forward to seeing this fixed. Thanks!

jlinford avatar Oct 02 '23 21:10 jlinford