mdBook
mdBook copied to clipboard
Fix relative links to README.md files
Fixes #1920
I suggest adding to the 'index' preprocessor a call to swap the filenames in links in the chapters' contents.
Fixes #984
@ehuss Is this change easy to review and accept?
I would prefer to not use regular expressions to translate the links. Can the translation be done in adjust_links
or somewhere like that?
Would love to see this merged.
@CaydenPierce would you be able to make the requested improvement to the original PR, using a more robust/performant matching algorithm than regular expression matching?
Hi, I've been reading the discussions on this topic and I'm also interested in seeing mdBook incorporate a solution for this issue.
Can the translation be done in adjust_links or somewhere like that?
I had a look and came up with something that extends the existing regular expression in adjust_links
. However this would apply to all links, even if the 'index' preprocessor has been disabled. I'm sharing it in case this can help moving forward on this.
diff --git a/src/utils/mod.rs b/src/utils/mod.rs
index 9156916..d0cda17 100644
--- a/src/utils/mod.rs
+++ b/src/utils/mod.rs
@@ -95,7 +95,7 @@ pub fn unique_id_from_content(content: &str, id_counter: &mut HashMap<String, us
fn adjust_links<'a>(event: Event<'a>, path: Option<&Path>) -> Event<'a> {
static SCHEME_LINK: Lazy<Regex> = Lazy::new(|| Regex::new(r"^[a-z][a-z0-9+.-]*:").unwrap());
static MD_LINK: Lazy<Regex> =
- Lazy::new(|| Regex::new(r"(?P<link>.*)\.md(?P<anchor>#.*)?").unwrap());
+ Lazy::new(|| Regex::new(r"(?P<link>.*?)(?P<readme>README)?\.md(?P<anchor>#.*)?").unwrap());
fn fix<'a>(dest: CowStr<'a>, path: Option<&Path>) -> CowStr<'a> {
if dest.starts_with('#') {
@@ -126,7 +126,16 @@ fn adjust_links<'a>(event: Event<'a>, path: Option<&Path>) -> Event<'a> {
}
if let Some(caps) = MD_LINK.captures(&dest) {
- fixed_link.push_str(&caps["link"]);
+ let link = &caps["link"];
+ fixed_link.push_str(link);
+ // "Links to README.md will be converted to index.html"
+ if let Some(readme) = caps.name("readme") {
+ if link.is_empty() || link.ends_with('/') {
+ fixed_link.push_str("index");
+ } else {
+ fixed_link.push_str(readme.as_str());
+ }
+ }
fixed_link.push_str(".html");
if let Some(anchor) = caps.name("anchor") {
fixed_link.push_str(anchor.as_str());
Several questions I can think of:
- How to deal with the fact that the 'index' preprocessor can be run or not? Maybe the logic should live in
src/preprocess/index.rs
and make calls toadjust_links
. - "I would prefer to not use regular expressions to translate the links.": Is it ok to extend the regex in
adjust_links
?
Maybe README.md
could be mapped to index.html
AND README.html
. This way the links don't need to be updated. Just an alternative as changing the links seems to be complicated. I don't believe that having README.md
should be a problem for users. After all, users can have either paths ending in /
or in /index.html
, so why not /README.html
as well? It might not look as cool as having URLs end in /
but it is simple and robust.
Maybe
README.md
could be mapped toindex.html
ANDREADME.html
. This way the links don't need to be updated. Just an alternative as changing the links seems to be complicated. I don't believe that havingREADME.md
should be a problem for users. After all, users can have either paths ending in/
or in/index.html
, so why not/README.html
as well? It might not look as cool as having URLs end in/
but it is simple and robust.
That would work great, but the canonical address should be recorded in the HTML to prevent duplicate content for indexers like online search engines.
Looking forward to seeing this fixed. Thanks!