MyST-Parser
MyST-Parser copied to clipboard
❗️ REFACTOR markdown links
The Issue
In MyST, currently, there is limited capability to specify "document-level" referencing, which work independent of the larger project.
Currently, all sphinx reference roles are project wide, e.g. {any}, {ref}, {doc}, ...
Note, these roles also have two limitation: (a) they are maybe not so "Markdownic", and (b) they cannot support nested syntax text.
Then for Markdown links:
[text](https://example.com)is an external link[text](#target)links only to local "myst anchors", created by the anchor extension[text](path/to/doc.md)links to another document[text](path/to/doc.md#target)links only myst anchors in another document[text](target)links to anything in the project
Note also, that (2), (3), and (4) do not work for (docutils) single page builds, and (5) acts differently dependent on single page (docutils) or project (sphinx) builds.
The Goals
- Have
[text](#target)link to any target in the local document, and work for docutils and sphinx - Replace
[text](target)with a more "specific" syntax, for what the target is targetting
Aside: anatomy of a CommonMark link
[Explicit _Markdown_ text](URI "optional explicit title")
The Uniform Resource Identifier (URI), should generally follow the specification in:
URI = scheme ":" ["//" authority] path ["?" query] ["#" fragment]
Note, if your URI has spaces in, then it can be enclosed in <>, e.g.
[text](<URI with space> "title")
Proposal
- This PR makes
[text](#target)and[text](relative/path/file.md#target)work to reference any "standard" local target (plus anchors). [text](target)is replaced with amystscheme, that can have different specificity[text](myst:reftype[?refquery]#target)
The implemented link types are currently as follows:
| Link Type | Auto | Inline | Docutils* |
|---|---|---|---|
| External URL | <https://example.com> |
[](https://example.com) |
✅ |
| Local file | - | [](file.txt) |
❌ |
| Project document | <myst:doc#file> |
[](file.md) |
❌ |
| Local target | <myst:local#target> |
[](#target) |
✅ |
| Target in document | <myst:doc?t=target#file> |
[](file.md#target) |
❌ |
| Target in project | <myst:project#target> |
[](myst:project#target) |
❌ |
| Target in inventory | <myst:inv#target> |
[](myst:inv#target) |
❌ |
* these have logic that relies on handling a full project, and so cannot be used when single document parsing
Questions:
- How best to have syntax to represent up
reftype,reftarget,refquery?
Note about:
- difficulty of not wanting to replicate/maintain sphinx stuff, but all those roles don't allow nested text
- specificity
- what's allowed by docutils
- "auto" creation of link text
- anchor creation
TODO:
- document path always posix?
- files / documents relative to project source?
- Have
[text](=target)be an inline target
Codecov Report
Base: 89.86% // Head: 89.27% // Decreases project coverage by -0.58% :warning:
Coverage data is based on head (
ec01c85) compared to base (28725fc). Patch coverage: 88.72% of modified lines in pull request are covered.
:exclamation: Current head ec01c85 differs from pull request most recent head f468ad0. Consider uploading reports for the commit f468ad0 to get more accurate results
Additional details and impacted files
@@ Coverage Diff @@
## master #613 +/- ##
==========================================
- Coverage 89.86% 89.27% -0.59%
==========================================
Files 21 24 +3
Lines 2150 2826 +676
==========================================
+ Hits 1932 2523 +591
- Misses 218 303 +85
| Flag | Coverage Δ | |
|---|---|---|
| pytests | 89.27% <88.72%> (-0.59%) |
:arrow_down: |
Flags with carried forward coverage won't be shown. Click here to find out more.
| Impacted Files | Coverage Δ | |
|---|---|---|
| myst_parser/config/main.py | 85.96% <72.72%> (-0.96%) |
:arrow_down: |
| myst_parser/mdit_to_docutils/local_links.py | 82.25% <82.25%> (ø) |
|
| myst_parser/mdit_to_docutils/base.py | 90.70% <85.83%> (-1.43%) |
:arrow_down: |
| myst_parser/sphinx_ext/references.py | 86.80% <86.80%> (ø) |
|
| myst_parser/sphinx_ext/main.py | 90.19% <91.66%> (-0.43%) |
:arrow_down: |
| myst_parser/mdit_to_docutils/inventory.py | 92.37% <92.37%> (ø) |
|
| myst_parser/warnings.py | 96.42% <96.42%> (ø) |
|
| myst_parser/mdit_to_docutils/html_to_nodes.py | 90.90% <100.00%> (+0.16%) |
:arrow_up: |
| myst_parser/mdit_to_docutils/sphinx_.py | 98.92% <100.00%> (+4.74%) |
:arrow_up: |
| myst_parser/parsers/docutils_.py | 83.62% <100.00%> (+2.53%) |
:arrow_up: |
| ... and 6 more |
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
I quite like the [text](myst:project#target) syntax.
This seems like a nice direction to me - I like the idea of myst: being an extension point in the links.
A few quick thoughts:
- Could the
@symbol be useful here? It has a common connotation as a "reference" symbol for citations in Pandoc. Maybe@targetcan be a short-hand for just[](#target)?@mydoc.md#target->[](mydoc.md#targetetc? - Could we use a short-hand for
myst:to avoid the extra verbosity (which I think would be a bigger deal if there are lots of in-line references like this)? E.g.[]($project#targetwould be short-hand for[](myst:project#target. Maybe that's something to think about for the future though, probably not needed now. - From a design perspective, I think we should try to disentangle "what is the most intuitive / flexible design for reference syntax" and "what is possible in Docutils / Sphinx. I know that there are obviously important relationships between the two from a practicality perspective, but I feel like our target should be "the best implementation-agnostic MyST spec".
- Here's Obsidian's design around references, it seems to have a pretty happy/loyal following around it. Although they use wiki-style link syntax (ref: https://github.com/executablebooks/MyST-Parser/issues/421)
Thanks for the comments @choldgraf
Could the @ symbol be useful here? It has a common connotation as a "reference" symbol for citations in Pandoc.
This is exactly I didn't want to use it here, since it is specifically reserved for citations, which are not the same as internal links. In the future it is likely that we will want to use @ specifically for this purpose of citation referencing, i.e. referencing a "key" in an external file
Could we use a short-hand for
myst:to avoid the extra verbosity
"terseness" is certainly design goal 👍, but yeh I'd just worry about introducing too many "magic" symbols without thinking through it properly. I'd say we want a balance between:
- commonmark compliance: where possible we should "re-use" the already available syntax, or at least have it degrade nicely
- remeberability: having a syntax that is easy to remember
- readability: having a syntax which people can understand at a glance
- terseness: limiting "boilerplate" syntax
- extensibiility: having syntaxes that will not limit us from adding features in the future
Makes sense re: citations. I agree that is a different thing.
For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point
For short-hand symbols, sounds like something we can just track in a separate issue and revisit if many users report verbosity as a pain point
Yeh exactly, I also would rather not introduce too many ways of doing the same thing
Just a note that pandoc @my-label works for both citations and inline references, which as an author makes this very simple to write.
Question on a cross-project link: are you intending the syntax to be [text](myst:project?o=label#target) where project is filled in with the name of the project, keyed off the config? For example, [](myst:spec#directive) if I wanted to target this, where the spec name/id was defined somewhere in my config?
From the updated description it looks like you might be adding this as the i part instead? Going with something like the above might cut back on the verbosity and number of keys/query params you have to remember (i.e. the local project is named project) and the last two link examples are the same.
What is the o= query param supposed to do? Assuming d is for domain? And i could be disappear if using the name of the project after myst:.
New documentation is ready: https://myst-parser--613.org.readthedocs.build/en/613/syntax/syntax.html#links-and-referencing
Had an hour long convo with @chrisjsewell today, some notes below!
Summary
We should simplify some of the syntax:
- Targets
[](#target)look up locally, then project wide (but not externally)- Targets can be explicitly done under the
<project:>protocol, which can have an optional file path. - These are really only used for completeness, and documentation points people towards markdown links
- This is nice, because vscode autocompletes, and the syntax is really terse and we don't loose any thing (I don't think)
- Targets can be explicitly done under the
- The
myst:protocol is followed by the project, rather thaninv - Relative and absolute paths work. Absolute paths are from the project root. The path separator is posix
/ - This should work with external
objects.invfrom intersphinx, and these are named explicitly in theconfig.ymlorconfig.pyand can be looked up.- For example:
[](myst:jupyterbook#getting-started)
- For example:
Scratch Notes:
```yaml
intersphinx:
jupyterbook: (https://..., None)
```
[see external figure](#equation)
[see external figure](#equation)
<project:#equation> % This tries local and then the project
<project:file.md#equation> % This only tries the specific file
<project:/file/path.md#equation> % You can do the local file in the project, but it is a bit awkward
[](./abstract.md) % strips the md
[](_toc.yml) % downloads the thing, split the fragment off, (maybe warn?)
[](/) % This is from the root of the project.
file1.md
# introduction
[](#introduction) % links locally, always, warns if it is implicit
file2.md
(introduction)= % this should warn (this is a sphinx warning)
# some other header
* resolves explicit local
* resolves implicit local (warn if you are trying to link to implicit)
* resolves explicit project
[see external figure](myst:jupyterbook#equation)
<myst:jupyterbook#equation>
{external+jupyterbook:py:class}`equation`
% File part is posix
<myst:doc#file> --> <project:file>
<myst:doc?t=target#file> --> <project:file#target>
<myst:inv#target> --> <myst:jupyterbook#target>
[](myst:inv#target) --> <myst:jupyterbook#target>
Had an hour long convo with @chrisjsewell today! Some notes below, will clean up this in a sec!
Yep cheers, actually turned in to 2.5 hours 😅 with plenty of actionable items 👌
Just to add here a part of the design spec I was working on, how sphinx internal targets work:
Sphinx internal target specification
At a minimum, a target must have fields: domain, object_type, docname, name and id.
The name should be unique per domain and object_type.
The user should be able to reference the target using name, and optionally filter by domain and/or object_type (these should contain only a-z).
Names are lower-cased, and whitespace-normalised (all whitespace is replaced with a single space).
The id should be unique per docname.
This is generated internally and need not be exposed to the user.
It should comply with the regex [a-z](-?[a-z0-9]+)*.
(tip to make unique append env.new_serialno())
Each target name can optionally have an implicit text field,
which is the default text used when referencing the target, if no explicit text is provided by the user.
Each target can also have an enum_type and number fields.
All number fields must be unique per enum_type.
A Domain class is responsible for storing and retrieving targets for its object_type and enforcing the above uniqueness contraints.
A Domain should implement the get_objects() method, which returns an iterator of all targets for the domain:
(name, text, object_type, docname, id, priority).
This should be available on reference resolution, after all documents have been parsed.
text can be empty and priority is used to resolve conflicts when multiple targets have the same name.
New for myst-parser:
A Domain can optionally implement a get_object_enum(docname, otype, name) method, which returns the (enum_type, number) for the target or (None, None) if not available.
This should be available on reference resolution, after all documents have been parsed.
Output formats:
html:idis used as theidattribute of the target element, and reference anchors usehref="#<id>".latex:docnameandidare used to generate the label\label{identifier}, whereidentifieris an escaped version of<docname>:<id>. In numbered references, rather than explicitly adding the name or number,\nameref{identifier}and\ref{identifier}are used, so that latex can handle the numbering.singlehtml: Not currently working (https://github.com/sphinx-doc/sphinx/issues/4814), but should work the same aslatex, to combinedocnameandid.