docusaurus
docusaurus copied to clipboard
Improve CodeBlock Extensibility
Have you read the Contributing Guidelines on issues?
- [x] I have read the Contributing Guidelines on issues.
Description
The main goal of this feature is to allow developers to extend the syntax highlighting in a react/docusaurus compatible way and by this work around the limitations of Prismjs plugins not being available.
Has this been requested on Canny?
No response
Motivation
Docusuarus / prism-react-renderer do not allow using standard prismjs plugins due to the DOM manipulation nature of them. It has been discussed that Prism plugins are not supported, and it also was discussed to use alternative components for the syntax highlighting needs.
- https://github.com/facebook/docusaurus/issues/3318
- https://github.com/facebook/docusaurus/issues/9122
With this proposal developers would get a mechanism of enriching the syntax highlighting by utilizing magic comments and swizzling without major impact to the Docusaurus core functionality.
My personal goal is to develop a plugin where I can generate hyperlinks for individual tokens rendered to let user jump to the related reference documentation. The current state of implementation requires me to swizzle and adapt a major part of the CodeBlock components to achieve this.
API design
As of today Docusaurus offers:
- Magic Comments as a means of highlighting individual lines via CSS classes.
- Metadata string as a means of injecting simple options
- Fine grained swizzling as mechanism to allow customizing the component rendering.
Based on this strategy also the extensibility of CodeBlocks should be designed. In a first step, the goal is to allow individual website authors to customize the components to their needs. It is at this point a non-goal to provide a fully fledged plugin system where you could pull NPM packages and register them. This could become an option in future.
The concrete proposal is to:
- Provide a mechanism for end users to inject "plugin" configurations. (optional but recommended)
1.1. Parse Metadata string into a option bag.
The metadata string is currently a plain string rather parsed individually at some spots. Docusaurus should attempt parsing the options with a defined syntax:
grammar ExprParser;
metadataComment
metadata
: rangeSyntax? optionSyntax* EOF;
rangeSyntax: '{' lineRange (',' lineRange)* '}';
lineRange: INT
| INT '-' INT;
optionSyntax: ID
| ID '=' ID
| ID '=' STRING;
INT : [0-9]+ ;
ID: [a-zA-Z_][a-zA-Z_0-9]* ;
WS: [ \t\n\r\f]+ -> skip ;
STRING : '"' ~[<"]* '"' | '\'' ~[<']* '\'';
The options are then parsed into a defined data structure for easy access in later steps. Also docusaurus could internally benefit from this bag instead of doing "contains" checks.
https://github.com/facebook/docusaurus/blob/3782244ce7fd95fdb3f94444884ae5a1a85da915/packages/docusaurus-theme-common/src/utils/codeBlockUtils.ts#L155-L165
options are meant to be unique and are overwritten. Support for arrays or maps as values could be added later if the need arises.
1.2. Metadata Magic Comments The metadata string requires all options to be placed in a single line which has quite some limits when you need to specify many options. Hence it would be a good extension to also allow specifying the options from 1.1 via some magic comment syntax.
These magic comments would be parsed into the property bag and then erased.
For a simple initial version we parse the whole code for magic comments and fill the bag at once. Later this could be extended that options can be changed at any time in the code affecting subsequent lines.
An option comment starts with a configurable special prefix (to be defined), should not interfere with the majority of real comments, and only allows one option per comment. The configuration option prism: { optionCommentPrefix: '!docusuarus-' } allows users to eliminate conflicts.
// !docusaurus-option
// !docusuaurs-option2=true
// !docusaurus-option3=value
// !docusaurus-option4="long value"
- Create a new swizzlable component for rendering individual tokens and ensure they receive the metadata bag. (required for MVP)
2.1 Add a swizzleable
LineToken
Currently the Line component creates a span preventing any customization of the rendered token.
https://github.com/facebook/docusaurus/blob/3782244ce7fd95fdb3f94444884ae5a1a85da915/packages/docusaurus-theme-classic/src/theme/CodeBlock/Line/index.tsx#L44
A new component allows swizzling and customizing the rendering on token level (e.g. a dev could create a <a href=""> instead).
2.2. Pass through the metadata options.
The parsed options from 1. should be passed down to the Line and LineToken components so devs can make use of it. If we decide not to parse the metadata string (yet?) at least the plain string available on String.tsx should be passed through.
Have you tried building it?
I successfully created a rough prototype locally by swizzling CodeBlock and Line to pass-through the metadata and customize the generated token. But it requires swizzling of various internal components which is risky.
Self-service
- [x] I'd be willing to contribute this feature to Docusaurus myself.
Thanks for the feature design.
I generally agree with many things you said.
This historical component is quite messy in its current state, and we should probably refactor it, handle the metastring in one place, and split it in more granular encapsulated components to make it easier to swizzle and extend.
In general, I'm not a fan of introducing a custom grammar for the metastring, and I think we should rather rely on something that already exists. I guess we could just use URLSearchParams. The problem is that we historically support the {1,3-5} syntax and removing it would be a breaking change. However that's probably easy to add a pre-processing step that expands {1,3-5} to highlight=1 highlight=3-5 and later parse it in a more generic way. (using spaces would probably be less awkward than using &).
In general, I think we should adopt React Server Components, and then eventually rebuild this code block component from the ground up based our ability to use more heavy highlighters like Shiki.
Until we get there, let's try to not be too disruptive. We can refactor/improve things incrementally. Some changes can be non-breaking, and we can also ship some breaking changes gated behind a v4 future flag.
If you want to send small refactor PRs, I'm happy to review them. It seems relatively safe to:
- Centralize things like the meta string parsing in a single place
- Create other subcomponents like
<LineToken>
It's probably worth considering other pending code block PRs I need to review:
- https://github.com/facebook/docusaurus/pull/10461
- https://github.com/facebook/docusaurus/pull/9349
Thanks for the feedback. I also agree with your points. The RFC might sound a bit more complex than the effective code change will likely be.
On the grammar you're right, but on the other hand, the syntax is already established (numbering, title, etc.) and regex based parsing is mostly implemented in codeBlockUtils. Writing a low level parser (e.g. a recursive decent parser) might have some slight performance benefits which likely do not justify the complexity maintenance effort.
I'll give it a shot to propose a PR, then we get a better feeling on the impact and real complexity.
@slorber A first proposal is now ready at https://github.com/facebook/docusaurus/pull/11011