cargo icon indicating copy to clipboard operation
cargo copied to clipboard

Add an unstable `--output-format` option to `cargo rustdoc`

Open LukeMathWalker opened this issue 2 years ago • 19 comments

Problem

rustdoc (on nightly) exposes an --output-format option to choose the format of the generated documentation: either HTML (the default) or JSON (the new unstable backend).

If you choose the JSON format, the output file of the unit is no longer /target/doc/<crate_name>/index.html; it becomes /target/doc/<crate name>.json.
Since there is no way to specify the output format as an option of cargo rustdoc itself, cargo assumes that the only possible output path for a doc unit is the HTML one. As a consequence, the JSON output is never considered "fresh" and we miss out on caching (see the test case in #12100, which fails on master).

Proposed Solution

Since cargo must remain unaware of the semantics of rustdoc's extra arguments, I suggest we add --output-format as an unstable cargo rustdoc argument, mimicking the semantics of the underlying --output-format option for rustdoc.

This would allow cargo to reason about it and take it into account in the fingerprinting logic.

Notes

No response

LukeMathWalker avatar May 07 '23 18:05 LukeMathWalker

It might be good to list all the concerns that need to be considered for this. For example:

  • What happens with --open? Does that generate an error?
  • Is there any interaction with -Z rustdoc-scrape-examples? Does the JSON include the scraped data? Or should scraping be disabled when generating the JSON?
  • This mentions adding it to cargo rustdoc, would it also be added to cargo doc? If not, is there a reason?
    • If we do add this to cargo doc, would it also generate JSON files for all the dependencies?

Overall seems like a reasonable flag to add to me.

ehuss avatar May 08 '23 16:05 ehuss

@obi1kenobi since you rely on rustdoc's output format, I'm curious on your thoughts on this issue.

epage avatar May 08 '23 16:05 epage

Thanks for raising those points @ehuss, I'll try to address them based on what I know.

* What happens with `--open`? Does that generate an error?

I don't think it's necessary to return an error, at least for the JSON format. Most browsers have native support for rich viewers when it comes to JSON documents, and I do find mysel opening the JSON docs in the browser from time to time for exploration. I suggest we keep honouring the current behaviour, even if the format is JSON.

* Is there any interaction with [`-Z rustdoc-scrape-examples`](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#scrape-examples)? Does the JSON include the scraped data? Or should scraping be disabled when generating the JSON?

I'm not familiar with the feature, nor with its implications for the JSON format. @aDotInTheVoid perhaps?

* This mentions adding it to `cargo rustdoc`, would it also be added to `cargo doc`? If not, is there a reason?
  * If we do add this to `cargo doc`, would it also generate JSON files for all the dependencies?

Yes, adding it to cargo doc is the natural extension and I do expect it to behave as you describe—generate the JSON docs for all dependencies. There are some edge cases to keep in mind in that case (i.e. the JSON filename is the crate name, leading to issues if you have multiple versions of the same crate in your dependency tree), but they must probably must resolved regardless.

I have limited this issue to cargo rustdoc since that is the primary interface at the moment and it looks like a reasonable first step.

LukeMathWalker avatar May 08 '23 17:05 LukeMathWalker

Thanks for the ping @epage. I'm strongly in favor of this. It's a reasonable feature to have, and from my perspective, improving the rustdoc JSON cache hit rate is extremely worth it. Thanks to @LukeMathWalker for figuring this out.

Anything that improves rustdoc JSON performance would be hugely impactful for cargo-semver-checks, since that's by far the "long pole" in its perf graph right now.

For context, cargo-semver-checks v0.20 shipped a massive perf improvement: 109x faster checking time in tokio, 2354x in aws-sdk-ec2, etc. We've also added caching of the "baseline" rustdoc JSON which gives us another 2x speedup on top of those numbers.

Since we've significantly reduced the time taken by all the other steps, the "current" version's rustdoc JSON generation is easily 70-80% of cargo-semver-checks wall-clock time right now. In turn, our users report that cargo-semver-checks even after the optimizations can be up to 50% of their total CI time


the JSON filename is the crate name

This is ever so slightly imprecise, the filename is the lib target's name with all - chars turned into _. This bit cargo-semver-checks recently so I'm bringing it up so it doesn't bite anyone else.

-Z rustdoc-scrape-examples

I believe the JSON does not have an ability to include examples. It only includes doc comments that are explicitly set on items. For the time being, scraping can be disabled. I have no opinion at the moment on whether rustdoc JSON should include examples, and I think it's fine to decide on that later since the JSON format is unstable anyway.

--open

I'm fine with attempting to open the JSON file with the user's default viewer (mine would be VSCode), falling back to the browser if no default viewer is available. If this is complex to achieve on day 1, I'm also fine with this being an error for now.

cargo doc

Totally fine with adding it there too, it makes sense to me.

obi1kenobi avatar May 08 '23 18:05 obi1kenobi

Is there any concern here that remains to be addressed? Or should we start sketching the implementation?

LukeMathWalker avatar May 18 '23 07:05 LukeMathWalker

Most concerns seem to be answered. I got some minor questions here: How stable is the rustdoc JSON output? Does it make sense that a user wants to keep different versions of JSON output at the same time?

Apart from my trivial questions, I feel like the flag isn't something that cannot be rolled back, and It also won't affect major build workflow of Cargo. Thus I am happy to discuss the design and implementation.

weihanglo avatar May 18 '23 08:05 weihanglo

How stable is the rustdoc JSON output? Does it make sense that a user wants to keep different versions of JSON output at the same time?

It is versioned (via the format_version field), but a specific rustdoc build can only emit a single version of the output. Supporting multiple versions is primarily a "client-side" concern, if you want your tool to be able to work across multiple nightly releases (e.g. what @obi1kenobi does for cargo-semver-checks).

LukeMathWalker avatar May 18 '23 09:05 LukeMathWalker

Let's do a quick poll before starting.

@rfcbot fcp merge

Proposed solution

Since cargo must remain unaware of the semantics of rustdoc's extra arguments, I suggest we add --output-format as an unstable cargo rustdoc argument, mimicking the semantics of the underlying --output-format option for rustdoc.

This would allow cargo to reason about it and take it into account in the fingerprinting logic.

weihanglo avatar May 18 '23 10:05 weihanglo

Team member @weihanglo has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @Eh2406
  • [x] @Muscraft
  • [ ] @arlosi
  • [x] @ehuss
  • [x] @epage
  • [ ] @joshtriplett
  • [x] @weihanglo

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar May 18 '23 10:05 rfcbot

So long as this is unstable, I have no concern with this. Once we have it,. it'll then also make it easier to explore the benefits and challenges with making it stable (assuming the rustdoc flag is stable)

epage avatar May 18 '23 14:05 epage

:bell: This is now entering its final comment period, as per the review above. :bell:

rfcbot avatar May 21 '23 03:05 rfcbot

There is no need to wait util FCP ends. People interested in doing it can start. The Cargo Team may not have a large capacity to mentor but for the review part we'll try our best.

weihanglo avatar May 21 '23 08:05 weihanglo

I'll get started next week then 👍🏻 Thanks!

LukeMathWalker avatar May 21 '23 10:05 LukeMathWalker

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

This will be merged soon.

rfcbot avatar May 31 '23 03:05 rfcbot

I'll get started next week then 👍🏻 Thanks!

Hi @LukeMathWalker , were you able to start working on this?

charmitro avatar Jun 05 '23 10:06 charmitro

No, a few other things had to take priority.

LukeMathWalker avatar Jun 05 '23 10:06 LukeMathWalker

No, a few other things had to take priority.

Can other folks(including me) give it a try? Else I believe it's better to assign it to yourself.

charmitro avatar Jun 05 '23 10:06 charmitro

If you have the time to start on it immediately, go ahead 😁

LukeMathWalker avatar Jun 05 '23 12:06 LukeMathWalker

@rustbot claim

charmitro avatar Jun 06 '23 16:06 charmitro

@obi1kenobi @LukeMathWalker The --output-format flag for cargo rustdoc is available on nightly. It would be much appreciated if you can provide feedback on the cache story, or other UX issues :)

weihanglo avatar Jan 19 '24 04:01 weihanglo