docusaurus icon indicating copy to clipboard operation
docusaurus copied to clipboard

Normalize heading ids / anchors / hash to lowercase?

Open slorber opened this issue 2 years ago • 10 comments

Have you read the Contributing Guidelines on issues?

Motivation

Docusaurus anchor links are currently case sensitive (noticed by implementing the anchor broken link checker: https://github.com/facebook/docusaurus/pull/9528)

  • this works: https://docusaurus.io/docs/next#design-principles
  • this fails: https://docusaurus.io/docs/next#Design-Principles

Yet, many other sites implement case-insensitive anchors, and those links will usually work:

  • https://developer.mozilla.org/en-US/docs/Learn/HTML#see_ALSO
  • https://www.markdownguide.org/getting-started/#why-use-MARKDOWN
  • https://jekyllrb.com/docs/posts/#including-IMAGES-and-resources
  • https://www.11ty.dev/docs/languages/javascript/#proMISE

Note: this is not standard browser behavior, the links won't work with JavaScript disabled.


So: should we also implement this?

Considering this is a non-native behavior, it remains better for progressive enhancement to have correct case-sensitive links in the first place.

Should we report links with incorrect case in the anchor broken link checker? (this won't block https://github.com/facebook/docusaurus/pull/9528 but we can do a follow-up PR)

Self-service

  • [ ] I'd be willing to do some initial work on this proposal myself.

slorber avatar Dec 22 '23 16:12 slorber

I'm +0.5 on this if other tools do the same. However, what if a page has two anchors with different casing? The thing that comes to mind is our write-heading-ids CLI which has an opt-in --maintain-case option, but this can also happen with hand-written ids.

Josh-Cena avatar Dec 24 '23 09:12 Josh-Cena

@slorber if this issue is resolved please close it, if not, let me know, and I will start working on it

surenpoghosian avatar Aug 06 '24 18:08 surenpoghosian

@slorber if this issue is resolved please close it, if not, let me know, and I will start working on it

Feel free to send directly a PR if it's open its available @surenpoghosian

OzakIOne avatar Aug 06 '24 22:08 OzakIOne

The problem is that we are still unsure what should be implemented exactly 😅

If you want to work on it, please tell us first what's your plan, because we might disagree on it.

slorber avatar Aug 07 '24 07:08 slorber

Thanks for quick response @OzakIOne @slorber

My plan is to implement case-insensitive anchor links by normalizing all anchor links to lowercase during generation and handling. I’ll ensure backward compatibility and update the documentation accordingly.

I also have a motivation to add a feature to the anchor broken link checker to report incorrect cases (but need more guidance for this one).

Does this approach align with what you had in mind ?

surenpoghosian avatar Aug 07 '24 15:08 surenpoghosian

Consider the edge cases.

  • An anchor called xyz exists. #XYZ should probably go to it, per this suggestion.
  • An anchor called XYZ exists. Should #xyz go to it?
  • Two anchors, one called xyz and one called XYZ, both exist (legal HTML). What should #XYZ go to? Does their relative order matter?
  • Two anchors, one called xyz and one called XYZ, both exist. What should #xYz go to? Does their relative order matter?

I can't speak for the other sites, but for the MDN site, all anchors are generated as lowercase, so this problem doesn't exist, but Docusaurus anchors are already case-sensitive, so navigation has to happen case sensitively. Of course, it can be an opt-in feature that anchors get lowercased, but this can't happen for user-provided components, and it would not be desirable for everyone. This issue is more about normalization of the anchor in the URL, not about anchors (ids) in the markup.

Josh-Cena avatar Aug 07 '24 22:08 Josh-Cena

@Josh-Cena

Hmm, okay, this made me think a little bit longer over this cases...

Addressing Edge Cases:

1. Single Anchor Matching:

  • Case 1: If an anchor called xyz exists, both #XYZ and #xyz should navigate to it if case-insensitive matching is implemented.
  • Case 2: If an anchor called XYZ exists, #xyz should navigate to it under the same principle.

2. Multiple Anchors with Different Cases:

  • Case 3: If both xyz and XYZ exist, the behavior should depend on whether exact case matching or order precedence is preferred:

    • Option A: Exact Case Matching: #XYZ navigates to XYZ, and #xyz navigates to xyz.

    • Option B: Order Precedence: The first anchor in the document’s order would be matched by case-insensitive URLs. For example, if xyz comes first, then #XYZ, #xyz, and any other case variant would match xyz.

  • Case 4: If both xyz and XYZ exist, #xYz should follow the same logic as above:

    • Option A: Case-insensitive matching navigates to the first one it finds.
    • Option B: Exact match to xyz or XYZ, based on the URL’s case, if available; otherwise, use the first one found.





Abstract Goals:

  • Keep the current case-sensitive behavior as the default since Docusaurus anchors are already case-sensitive.

  • Introduce an opt-in feature where users can enable case-insensitive URL matching. This option should be clearly documented as potentially leading to conflicts when multiple anchors with different cases exist.

  • When Optional Case-Insensitive Matching is enabled, anchor links are normalized (e.g., lowercased) for matching, but the matching process considers the order of anchors in the document.

  • Warning System: If the opt-in feature is enabled, and multiple anchors with differing cases exist, issue a warning during build or runtime about potential conflicts.

  • Priority Handling: Consider providing a configuration option for users to decide how conflicts are resolved (e.g., “prefer first match,” “prefer exact case,” etc.).

surenpoghosian avatar Aug 09 '24 17:08 surenpoghosian

Just to clarify, the opt-in feature would only affect how the URL is handled, not the actual IDs in the markup. This way, user-defined components stay intact, and we keep the current case-sensitive behavior as the default. The goal is to give users more flexibility if they need it, while making sure everything still works smoothly for existing content. Let me know if you have any other thoughts or suggestions!

surenpoghosian avatar Aug 09 '24 17:08 surenpoghosian