schemars Improve handling of Markdown comments

trafficstars

I was very pleased to see that schemars already includes code comments in the generated schemas. I mainly use schemars to improve the IDE experience, so this is really great.

This feature comes with a basic Markdown parsing feature: it removes single carriage returns (link to the code: https://github.com/GREsau/schemars/blob/master/schemars_derive/src/attr/doc.rs)

In the project I'm currently working on, the comments use more features of Markdown, including lists, code blocks and link references. These currently do not translate well to text.

As an example, here's a comment: https://github.com/fbecart/zinoma/blob/cc38f93734f0a8d79e12f8b5228fecb178bc68fe/src/config/yaml/schema.rs#L64-L90

This renders as the following JSON schema:

"targets": {
      "description": "A build flow is made of [`targets`]. Each target is a unit of work to perform as part of this build flow.\n\n[`targets`]: struct.Target.html\n\nTargets run in parallel by default. To run targets sequentially, you can define dependencies on other targets using the [`dependencies`] keyword.\n\n[`dependencies`]: struct.Target.html#structfield.dependencies\n\nEach target must have a unique name. The target name must be a string. It should start with an alphanumeric character or `_` and contain only alphanumeric characters, `-`, or `_`.\n\n__Example__\n\n```yaml targets: speak_cow: build: echo 'Moo' speak_dog: build: echo 'Woof!' ```\n\nIn this example:\n\n- `zinoma speak_cow` will print `Moo` - `zinoma speak_dog` will print `Woof!` - `zinoma speak_cow speak_dog` will print both `Moo` and `Woof!`, not necessarily in order.",
      "default": {},
      "type": "object",
      "additionalProperties": {
        "$ref": "#/definitions/Target"
      }
    }

Eventually, VSCode renders the documentation in this way:

I investigated a little bit further. IntelliJ only renders the first line of the "description", so it's usually fine. VSCode renders the description as plain text. It is not able to understand any Markdown or HTML.

Ideally, schemars would parse the comments with an actual Mardown parser and render the content to plain text from there.

I couldn't find any off-the-shelf crate able to convert Markdown to plain text. schemars could probably leverage https://crates.io/crates/comrak or https://crates.io/crates/minimad, but there is work to do in both cases.

May 28 '20 13:05 fbecart

I will simply use the title and description attributes for now: https://github.com/GREsau/schemars/blob/f9b14f7b00edcb4248c2243ddee9e116b798fa70/docs/1.1-attributes.md#schemarstitle--some-title-description--some-description

Jun 08 '20 06:06 fbecart

I have run into this issue as well. It seems that schemars forcibly re-wraps every paragraph somehow such that code blocks end up on a single line. I'm not entirely sure why it needs to do anything at all to the docstring in that regard. I would very much like to see schemars pass down the docstring unaltered as I use the jsonschema description attribute to generate markdown schema docs.

Jul 30 '20 07:07 untitaker

Have created a PR: https://github.com/GREsau/schemars/pull/45

Jul 30 '20 19:07 untitaker

@untitaker I understand that this solution is enough to improve support for comments that are heavy in code blocks.

But this toggle wouldn't solve my use case. Indeed, my documentation mixes plain text, links, code etc.

What I had in mind was to improve the handling of Markdown by using an actual parser, such as https://github.com/arranf/strip-markdown. This would respect all use cases: preserving line returns in code blocks, removing single line returns in plain text etc

Aug 01 '20 16:08 fbecart

@fbecart ok I just re-read your OP. could you explain how the schema ends up in vscode? vscode is definetly able to parse markdown in those tooltips as I used their APIs for this once (MarkedString IIRC), if just instructed to do so.

Aug 01 '20 19:08 untitaker

could you explain how the schema ends up in vscode?

Sure. The URL to Zinoma's schema is declared in SchemaStore: https://github.com/SchemaStore/schemastore/pull/1068/files

IDEs (VSCode, IntelliJ...) automatically pull the list of JSON schemas from SchemaStore. Here, if a file is called zinoma.yml, they will automatically associate the matching schema. This gives the user out of the box auto-completion, error detection and documentation in their text editor.

My understanding of the problem is the following:

Rust comments are expected to be in Markdown
In a JSON schema (https://json-schema.org/understanding-json-schema/reference/generic.html), the field description is expected to be a "string", which I interpret as "plain text".

I have observed that neither IntelliJ nor VSCode currently display correctly the descriptions in Zinoma's schema, because they are not expecting any Markdown there.

I lack a broad understanding of the usages of JSON schemas. But from what I can tell, what we'd need is a conversion from Markdown to text.

@untitaker What is your use case? How will the schema be used?

Aug 01 '20 20:08 fbecart

We're converting the schema to markdown docs: https://getsentry.github.io/relay/event-schema/

Main issue I have is that the docs are losing a lot of information if all the markdown is ripped out. I can live with unrendered markdown in my IDE tbh. Even if the jsonschema spec explicitly forbade markdown I would've probably gone ahead with this anyway.

Aug 01 '20 20:08 untitaker

Main issue I have is that the docs are losing a lot of information if all the markdown is ripped out.

That makes a lot of sense. Now I understand why you went with this solution. I agree that there is a need for a feature flag/option to pick the favorite "documentation rendering method":

either parse the markdown and render as text (which is the current behavior, although this implementation is not entirely correct)
either preserve the markdown as it is found in the Rust comment

And this what you've delivered.

Still, in order to help with my use case, I will try to use https://github.com/arranf/strip-markdown (this lib went under my radar at the time I created this issue).

Aug 01 '20 20:08 fbecart

it's probably possible to adapt my PR into something that takes a generic callback for processing documentation. The default could be the existing logic, and I could then hook in the identity function.

Aug 01 '20 20:08 untitaker

I'm not in the position to make a design decision, as I have a very limited knowledge of the codebase and the vision of the maintainer. As far as I can tell, both solutions should work.

Aug 01 '20 20:08 fbecart

Turns out that one can use #[schemars(description = "...")] to set a custom description (same with title).

We are now using this in https://github.com/getsentry/relay/pull/702.

I think this may help you achieve our goal. I'm closing my schemars PR as I no longer need this functionality.

Aug 01 '20 21:08 untitaker

I thought for a while of going that way (see https://github.com/GREsau/schemars/issues/38#issuecomment-640402721), but this represents a lot of comment duplication, which I figured would be particularly hard to maintain. A price I eventually refused to pay, feeling that this was instead truly an issue with Schemars.

Aug 02 '20 06:08 fbecart

I have run into this issue as well. It seems that schemars forcibly re-wraps every paragraph somehow such that code blocks end up on a single line. I'm not entirely sure why it needs to do anything at all to the docstring in that regard. I would very much like to see schemars pass down the docstring unaltered as I use the jsonschema description attribute to generate markdown schema docs.

The intention is that it removes extraneous newlines from simple text doc comments, since they may be spread across multiple lines just to keep the line length of the original Rust source reasonable, which ideally wouldn't affect any generated schema. However, this does indeed mangle more complex markdown e.g. code blocks and lists, so it might be doing more harm than good... I'm definitely considering removing this behaviour, and passing down the docstring unaltered as you suggest

In a JSON schema (https://json-schema.org/understanding-json-schema/reference/generic.html), the field description is expected to be a "string", which I interpret as "plain text".

I think your interpretation is overly strict. In particular, OpenAPI explicitly allows CommonMark in a schema's description.

@fbecart If you're still looking for an easy to get plain-text descriptions, my recommendation would be to define a custom Visitor that plainifies all schema descriptions recursively. I think this would do it (untested):

use schemars::visit::{Visitor, visit_schema_object};

#[derive(Debug, Clone)]
pub struct StripMarkdownVisitor;

impl Visitor for StripMarkdownVisitor {
  fn visit_schema_object(&mut self, schema: &mut SchemaObject) {
    if let Some(desc) = schema
      .metadata
      .as_mut()
      .and_then(|m| m.description.as_mut())
    {
      *desc = strip_markdown(desc)
    }
  
    visit_schema_object(self, schema);
  }
}

let schema = SchemaSettings::draft2019_09()
  .with_visitor(StripMarkdownVisitor)
  .into_generator()
  .into_root_schema_for::<MyType>();

Apr 25 '21 21:04 GREsau

Haha, I'm currently having the opposite problem. The string uses \n but this need to be <br/> to work nicely with the markdown parser. https://github.com/mrin9/RapiDoc/issues/572#issuecomment-928814662

But I'll handle this in Okapi in order to not break any other behavior for others.

Nov 25 '21 15:11 ralpha

I think your interpretation is overly strict. In particular, OpenAPI explicitly allows CommonMark in a schema's description.

@GREsau I believe you're right. Thanks for the solution you provided. I think the issue can be closed now.

Nov 25 '21 18:11 fbecart

@GREsau I believe you're right. Thanks for the solution you provided. I think the issue can be closed now.

I don't think we should close this just yet because of: (Edit: created new issue for this: #120 )

The intention is that it removes extraneous newlines from simple text doc comments, since they may be spread across multiple lines just to keep the line length of the original Rust source reasonable, which ideally wouldn't affect any generated schema. However, this does indeed mangle more complex markdown e.g. code blocks and lists, so it might be doing more harm than good... I'm definitely considering removing this behaviour, and passing down the docstring unaltered as you suggest

https://github.com/GREsau/schemars/issues/38#issuecomment-826395307

This might be able to be removed. But I'm not entirely sure one way or the other. I can see both sides. Maybe a solution would be create a visitor for this and allow users to remove or add this as they need.

To clarify for everyone we are talking about:

/// Some documentation
/// on multiple lines
/// right here.
/// 
/// This is a new paragraph.
struct Test{
   // ...
}

Should this result in:

"Some documentation on multiple lines right here.\n\nThis is a new paragraph." // Current (2021/11/25)
// or
"Some documentation\non multiple lines\nright here.\n\nThis is a new paragraph." // If we change is back

Edit: I'll create a new issue for this. As this is not what the initial question was. (has nothing to do with markdown anymore)

Nov 25 '21 18:11 ralpha

@GREsau There's really no good reason to strip excess newlines. By the CommonMark spec, these should already be treated as soft line breaks and handled during rendering: https://spec.commonmark.org/0.30/#softbreak

IMO, stripping the newlines violates the principle of least surprise since it breaks other Markdown features.

Apr 19 '22 19:04 abonander

Chiming in here after a few years, but VSCode will render markdownDescription if it exists. Maybe this should generate description with markdown stripped, and optionally inject markdownDescription if markdown was detected. It's not part of the json schema spec, but everyone is handling this differently.

Feb 14 '24 23:02 milesj

schemars schemars copied to clipboard

Improve handling of Markdown comments

schemars
schemars copied to clipboard