tsdoc
tsdoc copied to clipboard
RFC: Escaping doc comments inside doc comments
It's useful to users to include code examples in documentation comments. Sometimes you need to illustrate what the output of different code examples are, so you put it in a comment below, and if it's multiple lines, you use a block comment. The problem is that TSDoc doesn't correctly parse this:
/**
* ```
* getAverage()
* /* 'some output' */
* ```
*/
function getAverage(x, y) {
return (x + y) / 2.0;
}
I think it should handle block comments when they're inside a code block in the documentation comment.
This isn't a TSDoc problem. It's a JavaScript problem. JavaScript (and therefore TypeScript) does not support nested block comments, so the whole comment ends at 'some output' */
We could address this by providing an escaping mechanism like *\/. We need to formalize escaping in TSDoc but I've been procrastinating it because the problem is very thorny due to the onion skin of JSDoc tags, Markdown, HTML, and /* */. HTML is the only one of those four things that provides sane escaping mechanisms, but it would be */... (This is the problem with de facto standards heheh.)
TSDoc escaping will need to depend heavily on context. This works against our goal that the rendered result should be predictable by a person who doesn't have access to an interactive renderer. For example, the most convenient rule is probably: "Inside a markdown code fence, every character is preserved literally except for \/ which encodes a slash to support nested comments." But this rule is nearly impossible to infer. You would have to read the TSDoc spec and memorize it. So instead we would much prefer a rule like "Inside a markdown code fence, the \ character escapes any printable character that follows it." That's easy to infer and remember. But the quoted code itself may rely heavily on backslash escapes, which would cause nasty double-encodings. Perhaps / is not so bad!
This problem is fairly difficult. Users have a very strong expectation that a fenced code block is reproduced literally. For example, suppose we choose \ as our escape character. We get something like this:
/**
* ```ts
* /**
* * Matches "abc\/def" or "*\\/abc" but not "C:\\foo\\bar"
* *\/
* function foo(path) {
* return /[a-z*]+\/[a-z]+$/.test(path);
* }
* ```
*/
If we allow \/ to be escaped by the /* */ framing, it needs to be just the / character. It would be wildly unacceptable to expand every single \ escape before TSDoc starts parsing the input, because \ is a standard escape character used by Markdown, JSDoc, and also source code.
But this means \/ gets expanded before \\. So the second layer of the onion skin will look like this:
```ts
/**
* Matches "abc/def" or "*\/abc" but not "C:\\foo\\bar"
*/
function foo(path) {
return /[a-z*]+/[a-z]+$/.test(path);
}
```
It's highly confusing why the / in the RegExp got expanded but not the "C:\\, but I suppose someone could memorize that / is special. It's particularly counterintuitive with "*\\/abc" where \\/ gets processed from right-to-left, whereas a casual reader would assume the opposite.
The */ might be more acceptable merely because people don't use HTML escapes as much.
I wonder if *//* could be used as an escape character. ๐คญ That at least doesn't conflict with any possible expression inside a comment.
I thought of a possibly better solution to this problem!
Suppose we have an API like this:
function isDocComment(s: string): boolean {
return /^\/\*.*\*\/$/.test(s);
}
...and we want to show docs like this:
isDocComment()
function isDocComment(s: string): boolean;Returns
trueifsis enclosed in/*and*/comment characters.Example
// This prints "true": console.log(isDocComment("/** @public */"));
When we try to write TSDoc like this...
/**
* Returns `true` if `s` is enclosed in `/*` and `*/` <=== ERROR
* comment characters.
*
* @example
* ```ts
* // This prints "true":
* console.log(isDocComment("/** @public */")); <=== ERROR
* ```
*/
....we encounter a syntax error because JavaScript provides no mechanism for */ to appear inside a /** */ doc comment. But in well-formed doc comments, the * framing always has a space after it. Idea: What if we adopted a convention that *+ continues the previous line? Then we could escape it like this:
/**
* Returns `true` if `s` is enclosed in `/*` and `*
*+/`
* comment characters.
*
* @example
* ```ts
* // This prints "true":
* console.log(isDocComment("/** @public *
*+/"));
* ```
*/
The TSDoc parser would discard the newline and *+ when it extracts the content from the /** */ framing, and then the emitter could inject these escapes as needed.
If you want to line it up horizontally for readability, maybe we could also allow multiple + characters, like this:
/**
* Returns `true` if `s` is enclosed in `/*` and `*
*+++++++++++++++++++++++++++++++++++++++++++++++++/`
* comment characters.
*
* @example
* ```ts
* // This prints "true":
* console.log(isDocComment("/** @public *
*++++++++++++++++++++++++++++++++++++++++/"));
* ```
*/
(The delimiter doesn't have to be + . It could be any symbol other than * or /, however we should avoid characters that already have special meaning for Markdown or HTML. That eliminates a lot of options, leaving us mostly with +, -, or ^.)
What do you think? This seems better than the earlier proposals.
@rbuckton @iansan5653 @EisenbergEffect FYI
I read your comment first, and then had to go back and read all the previous comments to be able to to understand what that escaping is actually doing. Of course, I'm not complaining about having to read, but I think this would be very difficult to understand. It's one of those things that comes up enough that it would need to be used every so often, but not often enough that anyone would have any idea what they're looking at when they see it.
Is the only use case when you would ever use + for continuation when you have */ in a comment? If it's really only this one case, maybe it would be better to just reserve a special character sequence that always (no matter where it is in a TSDoc comment) resolves to */. Something like &COMMENT_CLOSE; maybe?
maybe it would be better to just reserve a special character sequence that always (no matter where it is in a TSDoc comment) resolves to
*/. Something like&COMMENT_CLOSE;maybe?
How would your approach represent this case? ๐ค
/**
* Here is an example of how to escape `/
*+*` using TSDoc:
*
* ```ts
* /**
* * Returns `true` if `s` is enclosed in `/*` and `*
* *+/`
* * comment characters.
* *
* * @example
* * ```ts
* * // This prints "true":
* * console.log(isDocComment("/** @public *
* *+/"));
* * ```
* */
* ```
*/
Is the only use case when you would ever use
+for continuation when you have*/in a comment?
It doesn't have to be a comment though; e.g. it could be a glob or RegExp with those characters.
BTW this issue might also arise if someone is using DocComment.emitAsTsdoc() to generate comments containing arbitrary strings. If */ isn't escaped properly, the output would have an actual JavaScript syntax error.
This feature could also be used to wrap long lines, e.g. to satisfy an ESLint rule that limits line length.
/**
* For more info please see https://github.com/micro
*+soft/tsdoc/issues/166
*/
Here's a poll where you can vote on the syntax: ๐
https://twitter.com/octogonz_/status/1204589348525527040?s=19
@DanielRosenwasser, @sandersn: What do you think of this? If *+ essentially becomes a JSDoc/TSDoc line continuation character, we might want to update our parser/language service to support it for quick info.
How would your approach represent this case?
I wrote a comment before but deleted it because I misunderstood what you're asking.
I don't know if we necessarily need to support this scenario - someone could just link to the spec page. Your escaping method does support this, but at the expense of being pretty difficult to read and write. Even your own example has a syntax error in it - it should be:
/**
* Here is an example of how to escape `/
*+*` using TSDoc:
*
* ```ts
* /**
* * Returns `true` if `s` is enclosed in `/*` and `*
* *+/`
* * comment characters.
* *
* * @example
* * ```ts
* * // This prints "true":
* * console.log(isDocComment("/** @public *
* *+/"));
* * ```
* *
*+/
* ```
*/
I do think being able to break URLs is actually a more useful feature than being able to escape comment-ending characters though and for that it might be worth it:
/**
* Add two numbers.
*
* @param a - the first number to add.
* @param b - the second number to add.
*
* @example Basic Usage
* ```ts
* add(123, 456); // -> 579;
* ```
* See it in action: [demo](http://www.typescriptlang.org/play/#code/GYVwdgxgLgl
*+g9mABAQwCaoBTIFyLCAWwCMBTAJwBpEjd9jyBKWw0sxAbwChEfEySoIMkmSIA1NQDcnAL6dOaTAEY
*+ATAGYqAFgCsANgaSgA).
*/
In this case I think the + definitely makes the most sense, although it does introduce the possibility of being misread as a space in the URL.
Another consideration - I don't think it would be an issue as long as the line-continuation behavior only occurs if there is no space before the character, but + and - are both markdown unordered list delimiters, so this has the potential for being misinterpreted as a list (both visually and by the parser).
but
+and-are both markdown unordered list delimiters, so this has the potential for being misinterpreted as a list (both visually and by the parser).
That could be an argument for ^. You could vote for that in the Twitter poll heheh. However, I'm not sure + will be used for lists. I've been arguing for TSDoc-flavored-Markdown to only allow - for lists, at least in strict mode. (Compared to typical Markdown scenarios, TSDoc often has a very long processing pipeline from doc comments to the final website. This means predictable rendering is a much higher priority for TSDoc. Ideally there shouldn't be any guesswork about how an expression will get rendered. This makes us want to avoid allowing redundant alternative syntaxes, and I'm also hoping to avoid lists-inside-lists, since nesting Markdown structures rely on counting spaces/newlines which is often unpredictable as well.)
By the way, we do plan to support HTML character entity references in TSDoc markdown. So instead of &COMMENT_CLOSE;, you would be able to represent */ as */ in many contexts.
But Markdown does not allow this inside a code block. For example:
/**
* In a future update to the TSDoc parser, this will get decoded: */
*
* ```
* const x = "*/"; // <--- this must be rendered as-is
* ```
*/
You could vote for that in the Twitter poll
I was about to, but I think ^ is actually the least intuitive (and also the ugliest) of the three - out of the options, I think + is the nicest-looking. Haven't really decided yet lol.
This seems like a lot of effort to work around a problem that could more readily be solved with "don't put */ in your code examples". Comments in a code block in a multi-line comment could just as easily be single-line comments, and if for some reason you need to represent */ in a string, just use something like one of these:
/**
* ```js
* foo("/*comment*" + "/")
* foo("/*comment*\/")
* ```
*/
ESLint (and other linters) can usually be configured to ignore certain comments or URLs in comments that break the line-length rule, so supporting some sort of line-continuation character doesn't seem very compelling.
For documentation generation purposes, you could always define a special @ tag that includes the text of the documentation (or code sample) from another file, though its unlikely editors would support this for things like quick-info.
This seems like a lot of effort to work around a problem that could more readily be solved with "don't put
*/in your code examples".
Not sure how persuasive it is, but here's some samples of real world code that would have benefited from correctly encoding */ in a doc comment:
https://github.com/microsoft/tsdoc/blob/9e0c30a1e0db0e34db13d481eec393e860419c6b/tsdoc/src/details/StandardTags.ts#L332-L350
rush-lib/src/cli/actions/InitAction.ts:
// Matches a well-formed END macro ending a block section.
// Example: /*[END "DEMO"]*/
//
// Group #1 is the indentation spaces before the macro
// Group #2 is the section name
private static _endMacroRegExp: RegExp = /^(\s*)\/\*\[END "([A-Z]+)"\]\s*\*\/\s*$/;
rush-lib/src/cli/actions/InitAction.ts:
// "Block section" macros have this form:
//
// /*[BEGIN "NAME"]*/
// (content goes
// here)
// /*[END "NAME"]*/
//
ย
And here's some .md files that just as easily could have been doc comments:
```javascript
var glob = require("glob")
// options is optional
glob("**/*.js", options, function (er, files) {
// files is an array of filenames.
// If the `nonull` option is set, and nothing
// was found, then files is ["**/*.js"]
// er is an error object or null.
})
```
eslint/docs/user-guide/command-line-interface.md:
#### `--no-inline-config`
This option prevents inline comments like `/*eslint-disable*/` or
`/*global foo*/` from having any effect. This allows you to set an ESLint
config without files modifying it. All inline config comments are ignored, e.g.:
* `/*eslint-disable*/`
* `/*eslint-enable*/`
(Note that /* eslint-disable */ cannot be written as // eslint-disable, even though ESLint strangely will accept // eslint-disable-line as a block comment or line comment.)
As a counterpoint, maybe we could look for a real world instance of *+ that would get misinterpreted under this proposal. But it has to be a project where the developers would actually care about it. ๐
I use comments that span multiple lines in docblock comment code examples pretty often to show the result of a function call. It's inconvenient to not be able to use multi-line comments (using an escape).
What I want:
/**
Lorem ipsum.
```
foo();
/*
{
"x": 1,
"y": 2,
"z": 3
}
\*/
```
*/
(Using some kind of escape character. \ used here as an example)
What I currently have to do:
/**
Lorem ipsum.
```
foo();
// {
// "x": 1,
// "y": 2,
// "z": 3
// }
```
*/
Which looks more noisy and is more annoying to edit.
I don't think \*/ will work.
@sindresorhus Your suggestion would require changing the ECMAScript grammar. Even if we could do that, the \ escape would imply that backslashes need to pile up awkwardly inside nested comments.
For example, instead of this:
Nested escapes using *+
/**
* Here is an example of how to escape `*
*+/` using TSDoc:
*
* ```ts
* /**
* * Returns `true` if `s` is enclosed in `/*` and `*
* *+/`
* * comment characters.
* *
* * @example
* * ```ts
* * // This prints "true":
* * console.log(isDocComment("/** @public *
* *+/"));
* *
* * // This is supposed to print a newline character:
* * console.log("\n");
* * ```
* *
*+/
* ```
*/
...your approach would produce something like this:
Nested escapes using \
/**
* Here is an example of how to escape `\*/` using TSDoc: <--- SINGLE ESCAPE
*
* ```ts
* /**
* * Returns `true` if `s` is enclosed in `/*` and `\\\*/` <--- TRIPLE ESCAPE
* * comment characters.
* *
* * @example
* * ```ts
* * // This prints "true":
* * console.log(isDocComment("/** @public \\\*/"));
* *
* * // This is supposed to print a newline character:
* * console.log("\\\\n"); <--- QUADRUPLE ESCAPE confusingly inside a string?
* * ```
* \*/ <--- SINGLE ESCAPE
* ```
*/
I'm becoming increasingly convinced that *+ is the only workable general solution for this problem.
It's attractive that it also gives us an intuitive way to wrap long lines.
@rbuckton's suggestion of "find ways to avoid saying Voldemort" is a close second, though. ๐
@octogonz \ was just an example. I'm happy with any escape character.
Another possibility (somewhat of an extension of the 'Voldemort' option) would be to recommend to use some sort of Unicode hack to just never have that */ sequence in a comment. For example, use a different asterisk-like character instead of the actual *, like ๏นก (more possibilities). Or a 0-width space between the two characters (*โ/). Of course it's not a great solution since it introduces some significant maintainability headaches, but it is one possibility that would require no changes to the parser.
@iansan5653 With arbitrary hidden or look-alike characters, the code snippets will fail to run when they are extracted and ran by some documentation tooling that looks for code blocks and converts them into live examples (f.e. places the snippet in a <code> element for display and makes it contenteditable and then runs the code in an <iframe>).
This problem is the one thing that makes me want to switch to /// comments as discussed in #160.
But the *+ notation above is a pretty workable compromise. I started on a PR to implement it a few weeks ago, but haven't had time to post it yet.
I've been wrestling with this problem with JSDoc for years, both from the side as a user trying to document things, and as the creator of jsdoc-md which generates markdown API docs from JSDoc.
This seems like a lot of effort to work around a problem that could more readily be solved with "don't put */ in your code examples".
Here are a few common situations where you need to use */ in a JSDoc comment:
-
When documenting a glob string as a default parameter value, e.g:
/** @param {string} [glob='**/*.js'] Glob pattern. */Real world examples:
- https://github.com/jaydenseric/jsdoc-md/tree/v9.1.1#function-jsdocmd
- https://github.com/jaydenseric/find-unused-exports/tree/v1.2.0#function-findunusedexports
-
When documenting a CLI command containing a glob, e.g:
/** * @example <caption>CLI usage.</caption> * ```sh * npx jsdoc-md --source-glob **/*.{mjs,js} * ``` */Real world examples:
- https://github.com/jaydenseric/jsdoc-md/tree/v9.1.1#cli
- https://github.com/jaydenseric/find-unused-exports/tree/v1.2.0#cli
-
When documenting template literal comment tags that have utility. A very common example is how linters, formatters, and syntax highlighters hook of the leading
/* GraphQL */comment to process the following template string containing GraphQL SDL. In the code examples of my server and client GraphQL related packages, I want any example containing a GraphQL query string to have/* GraphQL */in front of them to get users in a good habit, and so that the GraphQL SDL within code example is Prettier formatted (viaeslint-plugin-jsdocand thejsdoc/check-examplesrule).Example 1:
/** * @example <caption>Setup for a schema built with [`makeExecutableSchema`](https://apollographql.com/docs/graphql-tools/generate-schema#makeExecutableSchema).</caption> * ```js * const { makeExecutableSchema } = require('graphql-tools'); * const { GraphQLUpload } = require('graphql-upload'); * * const schema = makeExecutableSchema({ * typeDefs: /* GraphQL */ ` * scalar Upload * `, * resolvers: { * Upload: GraphQLUpload, * }, * }); * ``` */Another example of this situation is
babel-plugin-syntax-highlight, which uses template string comment tags for build-time syntax highlighting. I had to give up on JSDoc for documenting that package, and manually write write code examples in the readme markdown:
Real world examples:
- https://github.com/jaydenseric/graphql-upload/blob/v11.0.0/public/GraphQLUpload.js#L35
- https://github.com/jaydenseric/graphql-react/tree/v12.0.1#examples
- https://github.com/jaydenseric/babel-plugin-syntax-highlight/tree/v2.1.0#usage
For jsdoc-md, I came up with the convention that */ within the content of JSDoc descriptions or types can be escaped with *\/, and before processing such content is run through an unescapeJsdoc function that simply replaces all occurrences of *\/ with */:
https://github.com/jaydenseric/jsdoc-md/blob/v9.1.1/private/unescapeJsdoc.js#L3-L24
Edit: Updated function: https://github.com/jaydenseric/jsdoc-md/blob/c37795fa1976e17aeb4acce1832795efefa093c9/unescapeJsdoc.mjs#L1-L22
There are problems with this convention though, unless the wider ecosystem adopts it. Firstly, VS Code displays the code example with the escape still there in intellisense tooltips:
Secondly, I haven't figured out yet how to get the eslint-plugin-jsdoc rule jsdoc/check-examples to do the unescaping before attempting to parse examples as JS:
But in well-formed doc comments, the
*framing always has a space after it.
Is this TSDoc specific? Because in JSDoc, there is no such stated requirement except that whitespace must follow the initial /**.
It seems the TypeScript playground is respecting *\/ now, and as it also respects other context-specific cases (e.g., a single backslash is shown as a backslash as is a double one or predefined entities like & when encapsulated in backticks will not escape), it seems to me that this *\/ convention is justifiable (with *\\/ being needed to create a literal *\/).
Did anyone consider allowing and filtering out Zero-width space - Wikipedia?
Drawback: It is invisible. Advantage: It is invisible.
Example:
/**
* @param pattern glob pattern
* @param nested when true add `**โ/<glob>/โ**`
^^ zero-width space between * and /
* @returns the set of matching files.
*/
If tsdoc filtered out the zero-width space, then examples become useful.