Change rustdoc-scrape-examples to be a target-level configuration
This PR addresses issues raised in rust-lang/cargo#9525. Specifically:
- It enables examples to be scraped from
#[test]functions, by passing additional flags to Rustdoc to ensure that these functions aren't ignored by rustc. - It moves the
argfrom-Z rustdoc-scrape-examples={arg}into a target-level configuration that can be added to Cargo.toml.
The added test scrape_examples_configure_target shows a concrete example. In short, examples will be default scraped from Example and Lib targets. Then the user can enable or disable scraping like so:
[lib]
doc-scrape-examples = false
[[test]]
name = "my_test"
doc-scrape-examples = true
Somehow I feel like it'd be better to structure the configuration as something like:
[doc]
scrape-examples-targets = ["lib", "test"]
rather than putting doc-scrape-examples in each target section. Just a gut reaction though.
Somehow I feel like it'd be better to structure the configuration as something like:
[doc] scrape-examples-targets = ["lib", "test"]rather than putting
doc-scrape-examplesin each target section. Just a gut reaction though.
I think the issue with this is (a) it's not consistent with existing ways to configure targets, and (b) it doesn't allow target configuration at a higher granularity. For instance, the proposed method allows you to individually enable/disable specific examples and tests, while a list of target strings can't do so.
This all looks pretty reasonable to me, but I'd like to confirm my understanding. The intention is that once stabilized this is enabled for all projects by default? So all projects with libraries and examples, when using cargo doc, will see examples for methods and such drawn from libraries examples. Packages have fine-grained control at the per-target level what's used as candidates for examples and what isn't (with the default being only the lib targets and examples)
If that's the case I think it might be good to flesh out the failing tests for now. I think there may need to be some trickery around not running doc-scrape with stable Rust as-is today since Cargo tests its CI with all three channels of Rust. Otherwise though that seems pretty reasonable to me and the PR otherwise looks pretty good.
What is the right set of default targets to search for examples?
Personally the choice made here I think is fine. This seems like something we can alter over time if necessary as well.
Is there a different granularity at which example scraping should also be configurable?
At least from Cargo's perspective this probably won't have much impact, but I don't have much of an opinion on this otherwise.
As a random bikeshedding point seeing doc-scrape-examples = false in isolation feels a little surprising, so I might suggest something like doc-example-candidate = false or something like that.
I thought this feature would be disabled by default?
I thought this feature would be disabled by default?
My understanding is that it would be disabled by default while unstable, and then enabled by default once stabilized. The vast majority of people will not use this feature if it is hidden behind a flag.
My understanding is that it would be disabled by default while unstable, and then enabled by default once stabilized. The vast majority of people will not use this feature if it is hidden behind a flag.
Sorry, I definitely didn't understand it like this. In this case, we'll need to update the display quite a lot because it takes a lot of space by default. What about collapsing the section by default with a title like "see usage in examples" or something among the line? If you prefer we can discuss it on zulip.
The discussion about scraping-by-default-or-not has been pulled onto Zulip: https://rust-lang.zulipchat.com/#narrow/stream/266220-rustdoc/topic/Enabling.20scrape.20examples.20by.20default
:umbrella: The latest upstream changes (presumably #9992) made this pull request unmergeable. Please resolve the merge conflicts.
@willcrichton The .rustfmt.toml change and new log file, seem unintentional.
Ah thanks @lqd , got caught up in the rebase. I'll squash everything in a bit.
:umbrella: The latest upstream changes (presumably #10533) made this pull request unmergeable. Please resolve the merge conflicts.
This PR is ready for review, although it needs a new Cargo reviewer since Alex has stepped back (is there a way to nominate a new reviewer?)
As a reminder, this PR will enable the scrape examples feature by default, although only when run with a nightly rustc. Now that rust-lang/rust#93217 has landed, I think this feature is ready to be rolled out at least on docs.rs. Rustdoc maintainers, please let me know if you have any remaining concerns about this feature.
cc @jsha @camelid @GuillaumeGomez
@rustbot ready
It seems like only @ehuss is on reviewer rotation for Cargo right now, so I will add him. Sorry for all the extra work Eric :(
r? @ehuss
This seems to be a very unusual way of stabilizing something. Can you clarify exactly how this is intended to work? I'm skeptical that our setup can handle stabilizing in this fashion, so I'm a little concerned. Is there an intent to stabilize the rustdoc side? Usually that is done before the cargo side.
I'm also a little surprised to see this moving for stabilization so quickly. I don't have much of an impression that it has gotten much testing. I also haven't seen much discussion from the rustdoc side.
My goal here is to provide a bigger roll out for this feature before fully stabilizing it. There's been a lot of back-and-forth on the UI, but those discussions were finally settled (for now at least) in rust-lang/rust#93217. Essentially this change will activate the scrape examples feature by default on docs.rs, which uses nightly. More users will encounter the UI, and we we will be able to identify any remaining Cargo issues from broken doc builds. Once users are satisfied with the UI and all Cargo issues are settled, then we can finish stabilizing the feature.
Sorry, it may be misleading to say this PR "prepares the feature for stabilization". I mean preparing in a looser sense, as in "is the biggest remaining step before stabilization".
This does seem to actually stabilize it, though, in the sense that there is no longer a -Z flag to gate it. Cargo specifically has a system to make things like this require an explicit opt-in before it is stabilized. Users should be able to use the nightly channel the same as the stable channel.
I might suggest a different way to try to get more exposure. Would it be possible to just enable it on docs.rs by default? Have you talked to them about it?
Also, another tactic is to do a crater run with it enabled, just to see what basic problems might arise. I would do that before docs.rs. (I would also make sure the issues raised in this PR are fixed first.)
Ok that makes sense. Since this isn't stable yet, I restored the -Zrustdoc-scrape-examples flag. But now instead of passing =all or =examples, that configuration is still done at the target level in Cargo.toml. I will also do a crater run this weekend to ferret out any issues. I'll check with the docs.rs folks and see if they're willing to add the flag.
Update: I did a crater run of the scrape-examples feature as it exists on nightly on the top-10,000 crates (after implementing rust-lang/crater#626). In total there were 131 regressions. I could not find any issues directly in the implementation of scrape-examples. Instead, regressions had one of three causes:
- The project's examples did not type-check.
- The project itself did not type-check (e.g.
cargo checkwould fail whilecargo docwould work). - Dependencies of the examples did not type-check.
Aside: I'm actually a bit surprised that cargo publish includes examples by default, but doesn't check if they compile. Is that intended behavior? Ensuring that examples work at publish-time would avoid the first issue.
So the question then is whether these regressions matter. If it's not important that cargo doc should still work in these cases, then I think the feature is ready for deployment on docs.rs, especially once this PR lands. But if it is important, then I think the only fix would be to somehow mark the scrape units in Cargo as "fallible" or "optional", like if they fail, the build should still continue. I'm not sure if Cargo has such a concept, or how easy it would be to implement.
@ehuss what are your thoughts on this?
(note: also pulling this bigger discussion onto the rustdoc Zulip channel)
After some discussion with @jyn514 on Zulip, it's clear that a Docscrape unit should be able to fail without killing the build. Some crates (but not most!) will rely on the fact that cargo doc will work while cargo check will not, and that is fundamental to the design of rustdoc.
So I think what I'm going to do is keep this PR scoped to its original goal, of changing the target configuration for the scrape-examples feature (plus the couple bugfixes that have emerged in review). Then I'm going to put up a second PR that adds the ability to mark units as "ok to fail", which will be like a more granular version of BuildConfig::keep_going, and mark all Docscrape units as such. So @ehuss let me know if you have any more comments on this PR as is.
:umbrella: The latest upstream changes (presumably #10868) made this pull request unmergeable. Please resolve the merge conflicts.
@ehuss I just implemented the strategy you described here, and added a set of corresponding unit tests. Let me know if you think this captures your intent.
One implementation note: I'm using state.emit_diag("warning", ...) to emit the warning when silenced scraped units fail. I'm not sure if that's the correct API to use.
Can we move all of the docscrape tests to a separate module? Having doc.rs climb past 3,000 lines is making it a bit too heavy-weight, and it can be convenient to keep new features in their own little scope. I'll leave it up to you if it would be easier to do before or after this PR.
There doesn't seem to be anything that will enabling scraping of targets with doc-scrape-examples=true set (or to disable it if doc-scrape-examples=false).
I see that may be difficult to implement because has_dev_units needs to be determined before running the resolver, but it doesn't know what is necessarily being built before running the resolver. Perhaps one workaround is to call ws.members_with_features(…) and check if any of those packages have example targets with doc-scrape-examples=true, and if so enable dev-dependencies, and change the CompileFilter to include those targets.
Conversely, I imagine there should be some filter somewhere to prevent targets with doc-scrape-examples=false from being implicitly included.
Can we move all of the docscrape tests to a separate module? Having doc.rs climb past 3,000 lines is making it a bit too heavy-weight, and it can be convenient to keep new features in their own little scope. I'll leave it up to you if it would be easier to do before or after this PR.
Done, moved to a new docscrape.rs.
There doesn't seem to be anything that will enabling scraping of targets with doc-scrape-examples=true set (or to disable it if doc-scrape-examples=false). I see that may be difficult to implement because has_dev_units needs to be determined before running the resolver, but it doesn't know what is necessarily being built before running the resolver. Perhaps one workaround is to call ws.members_with_features(…) and check if any of those packages have example targets with doc-scrape-examples=true, and if so enable dev-dependencies, and change the CompileFilter to include those targets.
I just added that here, based on the strategy you recommended:
https://github.com/rust-lang/cargo/pull/10343/files#diff-98da1b66b532e50e9bca971b453ee7de96b17436de1d7ef3824c9888d55be9bbR377-R395
Conversely, I imagine there should be some filter somewhere to prevent targets with doc-scrape-examples=false from being implicitly included.
That is the purpose of this check after running generate_targets:
https://github.com/rust-lang/cargo/pull/10343/files#diff-98da1b66b532e50e9bca971b453ee7de96b17436de1d7ef3824c9888d55be9bbR535-R543
@rustbot label +S-waiting-on-author -S-waiting-on-review
Error: The feature relabel is not enabled in this repository.
To enable it add its section in the triagebot.toml in the root of the repository.
Please file an issue on GitHub at triagebot if there's a problem with this bot, or reach out on #t-infra on Zulip.
:umbrella: The latest upstream changes (presumably #9892) made this pull request unmergeable. Please resolve the merge conflicts.
I did another Crater run on the top 10k packages with the latest scrape examples implementation. I realized that the current strategy does not handle the following case:
- A crate's library documents but does not compile.
- The crate has an example and no dev-dependencies.
- Cargo tries to opportunistically scan the example.
- To scan the example, Cargo needs to check the example's dependencies, which includes the crate library.
- Cargo compiles the crate library and fails. The lib-check unit is not opportunistic, so its failure crashes the entire build.
I can only think of a few solutions to this:
- General solution: essentially what I proposed in #10596. If any transitive dependency of a fallible (or opportunistic or what have you) unit fails, then that has to kill the build up to the fallible dependency. This will solve the problem robustly, but adds a lot of new concepts and implementation complexity.
- Specific solution: if a lib-check unit fails AND it's a dependency of a scrape unit, then don't crash the build. This solves the specific problem, but adds yet another edge case to Cargo.
- Scrape examples becomes opt-in: we give up on trying to make any kind of default behavior for scrape examples, and users have to opt-in by passing some kind of flag or Cargo.toml parameter. This is guaranteed backward-compatible, but would cause the feature to be used far less.
@ehuss What do you think strikes the right balance between feature impact and Cargo tech debt? Or do you see another possible solution?
@willcrichton just fyi - I would recommend trying to land the changes you have so far before trying to make more. This PR is already quite big and it's hard to skim the comments, having a separate issue for the problem you found would be helpful I think.
That's sensible, I don't want to expand the scope of this PR too much. I'm happy to have this PR be merged as-is if we want to pull this discussion elsewhere.