MyST-Parser icon indicating copy to clipboard operation
MyST-Parser copied to clipboard

❗️ REFACTOR markdown links

Open chrisjsewell opened this issue 1 year ago • 11 comments

The Issue

In MyST, currently, there is limited capability to specify "document-level" referencing, which work independent of the larger project.

Currently, all sphinx reference roles are project wide, e.g. {any}, {ref}, {doc}, ... Note, these roles also have two limitation: (a) they are maybe not so "Markdownic", and (b) they cannot support nested syntax text.

Then for Markdown links:

  1. [text](https://example.com) is an external link
  2. [text](#target) links only to local "myst anchors", created by the anchor extension
  3. [text](path/to/doc.md) links to another document
  4. [text](path/to/doc.md#target) links only myst anchors in another document
  5. [text](target) links to anything in the project

Note also, that (2), (3), and (4) do not work for (docutils) single page builds, and (5) acts differently dependent on single page (docutils) or project (sphinx) builds.

The Goals

  1. Have [text](#target) link to any target in the local document, and work for docutils and sphinx
  2. Replace [text](target) with a more "specific" syntax, for what the target is targetting

Aside: anatomy of a CommonMark link

[Explicit _Markdown_ text](URI "optional explicit title")

The Uniform Resource Identifier (URI), should generally follow the specification in:

URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]

Note, if your URI has spaces in, then it can be enclosed in <>, e.g.

[text](<URI with space> "title")

Proposal

  1. This PR makes [text](#target) and [text](relative/path/file.md#target) work to reference any "standard" local target (plus anchors).
  2. [text](target) is replaced with a myst scheme, that can have different specificity [text](myst:reftype[?refquery]#target)

The implemented link types are currently as follows:

Link Type Auto Inline Docutils*
External URL <https://example.com> [](https://example.com)
Local file - [](file.txt)
Project document <myst:doc#file> [](file.md)
Local target <myst:local#target> [](#target)
Target in document <myst:doc?t=target#file> [](file.md#target)
Target in project <myst:project#target> [](myst:project#target)
Target in inventory <myst:inv#target> [](myst:inv#target)

* these have logic that relies on handling a full project, and so cannot be used when single document parsing


Questions:

  • How best to have syntax to represent up reftype, reftarget, refquery?

Note about:

  • difficulty of not wanting to replicate/maintain sphinx stuff, but all those roles don't allow nested text
  • specificity
  • what's allowed by docutils
  • "auto" creation of link text
  • anchor creation

TODO:

  • document path always posix?
  • files / documents relative to project source?
  • Have [text](=target) be an inline target

chrisjsewell avatar Aug 31 '22 16:08 chrisjsewell

Codecov Report

Base: 89.86% // Head: 89.27% // Decreases project coverage by -0.58% :warning:

Coverage data is based on head (ec01c85) compared to base (28725fc). Patch coverage: 88.72% of modified lines in pull request are covered.

:exclamation: Current head ec01c85 differs from pull request most recent head f468ad0. Consider uploading reports for the commit f468ad0 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #613      +/-   ##
==========================================
- Coverage   89.86%   89.27%   -0.59%     
==========================================
  Files          21       24       +3     
  Lines        2150     2826     +676     
==========================================
+ Hits         1932     2523     +591     
- Misses        218      303      +85     
Flag Coverage Δ
pytests 89.27% <88.72%> (-0.59%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
myst_parser/config/main.py 85.96% <72.72%> (-0.96%) :arrow_down:
myst_parser/mdit_to_docutils/local_links.py 82.25% <82.25%> (ø)
myst_parser/mdit_to_docutils/base.py 90.70% <85.83%> (-1.43%) :arrow_down:
myst_parser/sphinx_ext/references.py 86.80% <86.80%> (ø)
myst_parser/sphinx_ext/main.py 90.19% <91.66%> (-0.43%) :arrow_down:
myst_parser/mdit_to_docutils/inventory.py 92.37% <92.37%> (ø)
myst_parser/warnings.py 96.42% <96.42%> (ø)
myst_parser/mdit_to_docutils/html_to_nodes.py 90.90% <100.00%> (+0.16%) :arrow_up:
myst_parser/mdit_to_docutils/sphinx_.py 98.92% <100.00%> (+4.74%) :arrow_up:
myst_parser/parsers/docutils_.py 83.62% <100.00%> (+2.53%) :arrow_up:
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

codecov[bot] avatar Aug 31 '22 16:08 codecov[bot]

I quite like the [text](myst:project#target) syntax.

rowanc1 avatar Aug 31 '22 22:08 rowanc1

This seems like a nice direction to me - I like the idea of myst: being an extension point in the links.

A few quick thoughts:

  • Could the @ symbol be useful here? It has a common connotation as a "reference" symbol for citations in Pandoc. Maybe @target can be a short-hand for just [](#target)? @mydoc.md#target -> [](mydoc.md#target etc?
  • Could we use a short-hand for myst: to avoid the extra verbosity (which I think would be a bigger deal if there are lots of in-line references like this)? E.g. []($project#target would be short-hand for [](myst:project#target. Maybe that's something to think about for the future though, probably not needed now.
  • From a design perspective, I think we should try to disentangle "what is the most intuitive / flexible design for reference syntax" and "what is possible in Docutils / Sphinx. I know that there are obviously important relationships between the two from a practicality perspective, but I feel like our target should be "the best implementation-agnostic MyST spec".
  • Here's Obsidian's design around references, it seems to have a pretty happy/loyal following around it. Although they use wiki-style link syntax (ref: https://github.com/executablebooks/MyST-Parser/issues/421)

choldgraf avatar Sep 03 '22 12:09 choldgraf

Thanks for the comments @choldgraf

Could the @ symbol be useful here? It has a common connotation as a "reference" symbol for citations in Pandoc.

This is exactly I didn't want to use it here, since it is specifically reserved for citations, which are not the same as internal links. In the future it is likely that we will want to use @ specifically for this purpose of citation referencing, i.e. referencing a "key" in an external file

Could we use a short-hand for myst: to avoid the extra verbosity

"terseness" is certainly design goal 👍, but yeh I'd just worry about introducing too many "magic" symbols without thinking through it properly. I'd say we want a balance between:

  • commonmark compliance: where possible we should "re-use" the already available syntax, or at least have it degrade nicely
  • remeberability: having a syntax that is easy to remember
  • readability: having a syntax which people can understand at a glance
  • terseness: limiting "boilerplate" syntax
  • extensibiility: having syntaxes that will not limit us from adding features in the future

chrisjsewell avatar Sep 03 '22 12:09 chrisjsewell

Makes sense re: citations. I agree that is a different thing.

For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point

choldgraf avatar Sep 03 '22 12:09 choldgraf

For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point

Yeh exactly, I also would rather not introduce too many ways of doing the same thing

chrisjsewell avatar Sep 03 '22 12:09 chrisjsewell

Just a note that pandoc @my-label works for both citations and inline references, which as an author makes this very simple to write.

Question on a cross-project link: are you intending the syntax to be [text](myst:project?o=label#target) where project is filled in with the name of the project, keyed off the config? For example, [](myst:spec#directive) if I wanted to target this, where the spec name/id was defined somewhere in my config?

From the updated description it looks like you might be adding this as the i part instead? Going with something like the above might cut back on the verbosity and number of keys/query params you have to remember (i.e. the local project is named project) and the last two link examples are the same.

What is the o= query param supposed to do? Assuming d is for domain? And i could be disappear if using the name of the project after myst:.

rowanc1 avatar Sep 05 '22 22:09 rowanc1

New documentation is ready: https://myst-parser--613.org.readthedocs.build/en/613/syntax/syntax.html#links-and-referencing

chrisjsewell avatar Sep 06 '22 18:09 chrisjsewell

Had an hour long convo with @chrisjsewell today, some notes below!

Summary

We should simplify some of the syntax:

  • Targets [](#target) look up locally, then project wide (but not externally)
    • Targets can be explicitly done under the <project:> protocol, which can have an optional file path.
    • These are really only used for completeness, and documentation points people towards markdown links
    • This is nice, because vscode autocompletes, and the syntax is really terse and we don't loose any thing (I don't think)
  • The myst: protocol is followed by the project, rather than inv
  • Relative and absolute paths work. Absolute paths are from the project root. The path separator is posix /
  • This should work with external objects.inv from intersphinx, and these are named explicitly in the config.yml or config.py and can be looked up.
    • For example: [](myst:jupyterbook#getting-started)

Scratch Notes:


```yaml
intersphinx:
    jupyterbook: (https://..., None)
```

[see external figure](#equation)
[see external figure](#equation)

<project:#equation> % This tries local and then the project
<project:file.md#equation> % This only tries the specific file
<project:/file/path.md#equation> % You can do the local file in the project, but it is a bit awkward
[](./abstract.md) % strips the md
[](_toc.yml) % downloads the thing, split the fragment off, (maybe warn?)
[](/) % This is from the root of the project.

file1.md
# introduction

[](#introduction)            % links locally, always, warns if it is implicit

file2.md
(introduction)=              % this should warn (this is a sphinx warning)
# some other header


* resolves explicit local
* resolves implicit local     (warn if you are trying to link to implicit)
* resolves explicit project


[see external figure](myst:jupyterbook#equation)

<myst:jupyterbook#equation>

{external+jupyterbook:py:class}`equation`


% File part is posix

<myst:doc#file> --> <project:file>
<myst:doc?t=target#file> --> <project:file#target>

<myst:inv#target> --> <myst:jupyterbook#target>
[](myst:inv#target) --> <myst:jupyterbook#target>

rowanc1 avatar Sep 06 '22 22:09 rowanc1

Had an hour long convo with @chrisjsewell today! Some notes below, will clean up this in a sec!

Yep cheers, actually turned in to 2.5 hours 😅 with plenty of actionable items 👌

chrisjsewell avatar Sep 06 '22 22:09 chrisjsewell

Just to add here a part of the design spec I was working on, how sphinx internal targets work:

Sphinx internal target specification

At a minimum, a target must have fields: domain, object_type, docname, name and id.

The name should be unique per domain and object_type. The user should be able to reference the target using name, and optionally filter by domain and/or object_type (these should contain only a-z). Names are lower-cased, and whitespace-normalised (all whitespace is replaced with a single space).

The id should be unique per docname. This is generated internally and need not be exposed to the user. It should comply with the regex [a-z](-?[a-z0-9]+)*. (tip to make unique append env.new_serialno())

Each target name can optionally have an implicit text field, which is the default text used when referencing the target, if no explicit text is provided by the user.

Each target can also have an enum_type and number fields. All number fields must be unique per enum_type.

A Domain class is responsible for storing and retrieving targets for its object_type and enforcing the above uniqueness contraints.

A Domain should implement the get_objects() method, which returns an iterator of all targets for the domain: (name, text, object_type, docname, id, priority). This should be available on reference resolution, after all documents have been parsed. text can be empty and priority is used to resolve conflicts when multiple targets have the same name.

New for myst-parser: A Domain can optionally implement a get_object_enum(docname, otype, name) method, which returns the (enum_type, number) for the target or (None, None) if not available. This should be available on reference resolution, after all documents have been parsed.

Output formats:

  • html: id is used as the id attribute of the target element, and reference anchors use href="#<id>".
  • latex: docname and idare used to generate the label \label{identifier}, where identifier is an escaped version of <docname>:<id>. In numbered references, rather than explicitly adding the name or number, \nameref{identifier} and \ref{identifier} are used, so that latex can handle the numbering.
  • singlehtml: Not currently working (https://github.com/sphinx-doc/sphinx/issues/4814), but should work the same as latex, to combine docname and id.

chrisjsewell avatar Sep 12 '22 19:09 chrisjsewell