typedoc-plugin-markdown icon indicating copy to clipboard operation
typedoc-plugin-markdown copied to clipboard

Bug: Typedoc comments with text wrapped in "<" and ">" breaks Docusaurus Markdown parser

Open mongodben opened this issue 2 years ago • 8 comments

hello,

i'm using docusaurus-plugin-typedoc. it works perfectly with the exception of having trouble parsing the the characters < and > when they are wrapping a single word in a Typedoc comment. in this case, the Docusaurus interpreter reads it as a JSX element, and the compilation process crashes.

this issue occurred when parsing https://github.com/Chevrotain/chevrotain/blob/master/packages/types/api.d.ts#L160

the word <tokType> causes the problem. the error output is:

SyntaxError: /Users/ben.p/projects/Bluehawk/docs/docs/api/classes/RootParser.md: Expected corresponding JSX closing tag for <tokType>. (298:74)
  296 | In EBNF terms this is equivalent to a Terminal.`}</p>
  297 | <p>{`A Token will be consumed, IFF the next token in the token vector matches `}<tokType>{`.
> 298 | otherwise the parser may attempt to perform error recovery (if enabled).`}</p>
      |                                                                           ^
  299 | <p>{`The index in the method name indicates the unique occurrence of a terminal consumption
  300 | inside a the top level rule. What this means is that if a terminal appears
  301 | more than once in a single rule, each appearance must have a `}<strong parentName="p">{`different`}</strong>{` index.`}</p>
client (webpack 5.64.4) compiled with 1 error

i was able to fix this error by replacing <tokType> (anything without the angle brackets would work) with "tokType" in the file and rerunning yarn start.

would be great if the parser could take this scenario into account and sanitize the text, perhaps by replacing the angle brackets with their HTML escape characters &gt; and &lt; .

mongodben avatar Dec 17 '21 20:12 mongodben

hi @mongodben .. Docusaurus is trying to parse as MDX. I would suggest the solution here is to simply wrap in a backtick which will render as an inline code snippet.

* A Token will be consumed, IFF the next token in the token vector matches `<tokType>`.
Screenshot 2021-12-17 at 23 34 46

tgreyuk avatar Dec 17 '21 23:12 tgreyuk

thanks tom. unfortunately, the source code comes from an external library, so i don't have direct control over the typedoc comment which is breaking the MDX. i put in a PR with them with a change similar to what you propose here.

mongodben avatar Dec 21 '21 16:12 mongodben

ok thanks for letting me know @mongodben .. i will re-open this and have a think if this is a use-case that should be fixed at plugin level.

tgreyuk avatar Dec 23 '21 22:12 tgreyuk

I think the simple solution is to scape the HTML (<>) to (&lt;&gt;), but it is necessary to ignore contents inside of ` and ```

A smart solution is use a package to sanitize the output like xss

edgardmessias avatar Mar 08 '22 12:03 edgardmessias

xss makes sense to me for covering this case and anything else that could happen w unexpected processing of markup (in addition to XSS attacks via the docs, i guess). hopefully there wouldn't be a large implication on build time to run all text through the xss processor.

happy to implement the feature if pointed in the right direction where to implement w/in this project.

On Tue, Mar 8, 2022 at 7:04 AM Edgard Lorraine Messias < @.***> wrote:

I think the simple solution is to scape the HTML (<>) to (<>), but it is necessary to ignore contents inside of ` and ```

A smart solution is use a package to sanitize the output like xss

— Reply to this email directly, view it on GitHub https://github.com/tgreyuk/typedoc-plugin-markdown/issues/276#issuecomment-1061706647, or unsubscribe https://github.com/notifications/unsubscribe-auth/AVTSWU4PWYTFJZMWMXKBXGTU6465BANCNFSM5KJVWBJA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you were mentioned.Message ID: @.***>

mongodben avatar Mar 08 '22 14:03 mongodben

I just created a TypeDoc plugin that sanitizes using DOMPurify: https://github.com/fsmaia/typedoc-plugin-dompurify

However, it is sanitizing/removing content inside codeblocks.

Look at the following example:

/**
 * A custom component.
 *
 * @example
 * ```tsx
 * <CustomComponent><h1>Custom</h1></CustomComponent>
 * ```
 */
const CustomComponent: React.FC = ({ children }) => <>{children}</>

The resulting markdown is:

## Description

A custom component.

## Example

```tsx
<h1>Custom</h1>
`` `

Note: ignore the last space before code block close, it was just to escape it.

At this point, how could I use the TypeDoc plugin syntax/tools to avoid sanitizing an excerpt inside a code block (```)?

I'm thinking about running the marked parser just to know which excerpts are code blocks, so I can skip them.

fsmaia avatar Aug 19 '22 18:08 fsmaia

@fsmaia Are there instructions for how a docusaurus side could use that?

tony avatar Nov 26 '22 19:11 tony

I'm also getting hit by this:

https://github.com/lit/lit/blob/02c28397c612845ffab9e22af08eecc869f8ce47/packages/reactive-element/src/reactive-element.ts#L571-L574

   * 'nonce-<base64-value>' with <base64-value> replaced be a server-generated
   * nonce.
   *
   * To provide a nonce to use on generated <style> elements, set
Error with LitElement
SyntaxError: site/docs/api/classes/OEmbedElement.md: Expected corresponding JSX closing tag for <base64-value>. (2022:8)
  2020 | CSP directive, the style-src value must either include 'unsafe-inline' or
  2021 | 'nonce-`}<base64-value>{`' with `}<base64-value>{` replaced be a server-generated
> 2022 | nonce.`}</p>
       |         ^
  2023 | <p>{`To provide a nonce to use on generated `}<style>{` elements, set
  2024 | `}<inlineCode parentName="p">{`window.litNonce`}</inlineCode>{` to a server-generated nonce in your page's HTML, before
  2025 | loading application code:`}</p>
SyntaxError: site/docs/api/index.md: Expected corresponding JSX closing tag for <MDXLayout>. (68:165)
  66 | <h3 {...{"id":"static-site"}}>{`Static Site`}</h3>
  67 | <p>{`This project includes a simple website generated with the `}<a parentName="p" {...{"href":"11ty.dev"}}>{`eleventy`}</a>{` static site generator and the templates and pages in `}<inlineCode parentName="p">{`/docs-src`}</inlineCode>{`. The site is generated to `}<inlineCode parentName="p">{`/docs`}</inlineCode>{` and intended to be checked in so that GitHub pages can serve the site `}<a parentName="p" {...{"href":"https://help.github.com/en/github/working-with-github-pages/configuring-a-publishing-source-for-your-github-pages-site"}}>{`from `}<inlineCode parentName="a">{`/docs`}</inlineCode>{` on the master branch`}</a>{`.`}</p>
> 68 | <p>{`To enable the site go to the GitHub settings and change the GitHub Pages `}{`"`}{`Source`}{`"`}{` setting to `}{`"`}{`master branch /docs folder`}{`"`}{`.`}</p></p>
     |                                                                                                                                                                      ^
  69 | <p>{`To build the site, run:`}</p>
  70 | <pre><code parentName="pre" {...{"className":"language-bash"}}>{`npm run docs
  71 | `}</code></pre>
client (webpack 5.75.0) compiled with 2 errors

Package: [email protected]

tony avatar Nov 26 '22 19:11 tony

[email protected]

https://typedoc-plugin-markdown.org/docs/options#sanitizecomments

tgreyuk avatar May 03 '24 19:05 tgreyuk