docusaurus Improve CodeBlock Extensibility

Have you read the Contributing Guidelines on issues?

[x] I have read the Contributing Guidelines on issues.

Description

The main goal of this feature is to allow developers to extend the syntax highlighting in a react/docusaurus compatible way and by this work around the limitations of Prismjs plugins not being available.

Has this been requested on Canny?

No response

Motivation

Docusuarus / prism-react-renderer do not allow using standard prismjs plugins due to the DOM manipulation nature of them. It has been discussed that Prism plugins are not supported, and it also was discussed to use alternative components for the syntax highlighting needs.

https://github.com/facebook/docusaurus/issues/3318
https://github.com/facebook/docusaurus/issues/9122

With this proposal developers would get a mechanism of enriching the syntax highlighting by utilizing magic comments and swizzling without major impact to the Docusaurus core functionality.

My personal goal is to develop a plugin where I can generate hyperlinks for individual tokens rendered to let user jump to the related reference documentation. The current state of implementation requires me to swizzle and adapt a major part of the CodeBlock components to achieve this.

API design

As of today Docusaurus offers:

Magic Comments as a means of highlighting individual lines via CSS classes.
Metadata string as a means of injecting simple options
Fine grained swizzling as mechanism to allow customizing the component rendering.

Based on this strategy also the extensibility of CodeBlocks should be designed. In a first step, the goal is to allow individual website authors to customize the components to their needs. It is at this point a non-goal to provide a fully fledged plugin system where you could pull NPM packages and register them. This could become an option in future.

The concrete proposal is to:

Provide a mechanism for end users to inject "plugin" configurations. (optional but recommended)
1.1. Parse Metadata string into a option bag.

The metadata string is currently a plain string rather parsed individually at some spots. Docusaurus should attempt parsing the options with a defined syntax:

grammar ExprParser;

metadataComment

metadata
    : rangeSyntax? optionSyntax* EOF;

rangeSyntax: '{' lineRange (',' lineRange)* '}';

lineRange: INT 
         | INT '-' INT;

optionSyntax: ID
            | ID '=' ID
            | ID '=' STRING;
    
INT : [0-9]+ ;
ID: [a-zA-Z_][a-zA-Z_0-9]* ;
WS: [ \t\n\r\f]+ -> skip ;
STRING : '"' ~[<"]* '"' | '\'' ~[<']* '\'';

The options are then parsed into a defined data structure for easy access in later steps. Also docusaurus could internally benefit from this bag instead of doing "contains" checks.

https://github.com/facebook/docusaurus/blob/3782244ce7fd95fdb3f94444884ae5a1a85da915/packages/docusaurus-theme-common/src/utils/codeBlockUtils.ts#L155-L165

options are meant to be unique and are overwritten. Support for arrays or maps as values could be added later if the need arises.

1.2. Metadata Magic Comments The metadata string requires all options to be placed in a single line which has quite some limits when you need to specify many options. Hence it would be a good extension to also allow specifying the options from 1.1 via some magic comment syntax.

These magic comments would be parsed into the property bag and then erased.

For a simple initial version we parse the whole code for magic comments and fill the bag at once. Later this could be extended that options can be changed at any time in the code affecting subsequent lines.

An option comment starts with a configurable special prefix (to be defined), should not interfere with the majority of real comments, and only allows one option per comment. The configuration option prism: { optionCommentPrefix: '!docusuarus-' } allows users to eliminate conflicts.

// !docusaurus-option
// !docusuaurs-option2=true 
// !docusaurus-option3=value
// !docusaurus-option4="long value"

Create a new swizzlable component for rendering individual tokens and ensure they receive the metadata bag. (required for MVP) 2.1 Add a swizzleable LineToken

Currently the Line component creates a span preventing any customization of the rendered token.

https://github.com/facebook/docusaurus/blob/3782244ce7fd95fdb3f94444884ae5a1a85da915/packages/docusaurus-theme-classic/src/theme/CodeBlock/Line/index.tsx#L44

A new component allows swizzling and customizing the rendering on token level (e.g. a dev could create a <a href=""> instead).

2.2. Pass through the metadata options.

The parsed options from 1. should be passed down to the Line and LineToken components so devs can make use of it. If we decide not to parse the metadata string (yet?) at least the plain string available on String.tsx should be passed through.

Have you tried building it?

I successfully created a rough prototype locally by swizzling CodeBlock and Line to pass-through the metadata and customize the generated token. But it requires swizzling of various internal components which is risky.

Self-service

[x] I'd be willing to contribute this feature to Docusaurus myself.

Mar 19 '25 12:03 Danielku15

Thanks for the feature design.

I generally agree with many things you said.

This historical component is quite messy in its current state, and we should probably refactor it, handle the metastring in one place, and split it in more granular encapsulated components to make it easier to swizzle and extend.

In general, I'm not a fan of introducing a custom grammar for the metastring, and I think we should rather rely on something that already exists. I guess we could just use URLSearchParams. The problem is that we historically support the {1,3-5} syntax and removing it would be a breaking change. However that's probably easy to add a pre-processing step that expands {1,3-5} to highlight=1 highlight=3-5 and later parse it in a more generic way. (using spaces would probably be less awkward than using &).

In general, I think we should adopt React Server Components, and then eventually rebuild this code block component from the ground up based our ability to use more heavy highlighters like Shiki.

Until we get there, let's try to not be too disruptive. We can refactor/improve things incrementally. Some changes can be non-breaking, and we can also ship some breaking changes gated behind a v4 future flag.

If you want to send small refactor PRs, I'm happy to review them. It seems relatively safe to:

Centralize things like the meta string parsing in a single place
Create other subcomponents like <LineToken>

It's probably worth considering other pending code block PRs I need to review:

https://github.com/facebook/docusaurus/pull/10461
https://github.com/facebook/docusaurus/pull/9349

Mar 19 '25 14:03 slorber

Thanks for the feedback. I also agree with your points. The RFC might sound a bit more complex than the effective code change will likely be.

On the grammar you're right, but on the other hand, the syntax is already established (numbering, title, etc.) and regex based parsing is mostly implemented in codeBlockUtils. Writing a low level parser (e.g. a recursive decent parser) might have some slight performance benefits which likely do not justify the complexity maintenance effort.

I'll give it a shot to propose a PR, then we get a better feeling on the impact and real complexity.

Mar 19 '25 15:03 Danielku15

@slorber A first proposal is now ready at https://github.com/facebook/docusaurus/pull/11011

Mar 19 '25 19:03 Danielku15

docusaurus docusaurus copied to clipboard

Improve CodeBlock Extensibility

Have you read the Contributing Guidelines on issues?

Description

Has this been requested on Canny?

Motivation

API design

Have you tried building it?

Self-service

docusaurus
docusaurus copied to clipboard