rfcs icon indicating copy to clipboard operation
rfcs copied to clipboard

Bundle local images in rustdoc output

Open GuillaumeGomez opened this issue 1 year ago • 19 comments

Tracking issue

cc @rust-lang/rustdoc

Rendered

GuillaumeGomez avatar Feb 27 '23 09:02 GuillaumeGomez

Additional prior art: the library embed-doc-image accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself.

Unrelated to the above, it would be neat if there was a way to embed images that were generated by the crate's own code. For crates where such generation is at all within their scope, this would both allow for “guaranteed fresh” sample output and charts (much like doc-tests guarantee that examples work), and remove the need for documentation images to be included in the .crate package file (increasing its download size whether or not anyone ever looks at the documentation) or in the source repository (increasing its history, especially if the images need frequent changes).

I imagine that this is implausible for now because it would require Cargo and rustdoc to cooperate on a multi-step process that makes cargo doc into something more like cargo test, but I thought I'd mention it in case it prompts any further ideas.

kpreid avatar Mar 04 '23 06:03 kpreid

@kpreid: Do you mean images that are generated by the build.rs script? If so, aren't OUT_DIR and CARGO_TARGET_DIR env variables enough to handle it? But in any case, from what I can see, you can do it like follows:

  • generate images in build.rs when running in doc mode
  • reference to generated images from the documentation using relative paths

If rustdoc sees a relative path, it'll copy the image into the target directory in any case so that should cover your use case as well. Unless I missed something?

If it's not possible for any reason (please explain why), are you suggesting to add an option to rustdoc which would allow to copy local files to the rustdoc output folder?

GuillaumeGomez avatar Mar 04 '23 10:03 GuillaumeGomez

Do you mean images that are generated by the build.rs script?

No, I mean images generated by code that is dependent on the code in the library or binary crate being documented (in the same way that integration tests are dependent on said code) — so it would be well after the build script stage. I doubt this feature I am describing should be part of this RFC; I am just mentioning it as a future possibility that would be powerful enough that, if there is any consideration here that might make that more or less feasible, perhaps it should be considered now.

kpreid avatar Mar 04 '23 15:03 kpreid

Sure. I'm still not sure to fully understand what you explained but I don't mind adding this as part of a future extension. How would you word it?

GuillaumeGomez avatar Mar 04 '23 16:03 GuillaumeGomez

I don't have any specific changes in mind — I just wanted to raise the idea to get people thinking about it for the future. On re-reading, the only specific thing I would suggest in this RFC is to make it clear that given ![resource title](path), the path is not always a relative path to a local file, but that it might also be

  • an absolute URL (as is currently supported and is presumably unchanged by this RFC)
  • a Rust item path (in the manner of existing intra-doc links, with the semantics of that not yet defined and currently an error treated like any broken link)

Also, I think the first part of my original comment may have been missed; I intended to suggest that the prior art section should discuss the library embed-doc-image which currently accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself.

The disadvantages of that library which this RFC would improve on are that all image data has to pass through rustc and a proc macro (inefficiency), and data URLs (limited maximum size, not viewable outside a browser, a burden on anything manipulating the HTML).

kpreid avatar Mar 04 '23 18:03 kpreid

a Rust item path (in the manner of existing intra-doc links, with the semantics of that not yet defined and currently an error treated like any broken link)

I don't get this part: it doesn't understand the concept of rust item, only URLs and local paths (either relative or absolute). If the path provided isn't found, it will warn and keep it as is.

an absolute URL (as is currently supported and is presumably unchanged by this RFC)

It is written explicitly in the RFC:

The path could be any relative or absolute file path. For example, to include...

Also, I think the first part of my original comment may have been missed; I intended to suggest that the prior art section should discuss the library embed-doc-image which currently accomplishes the same goal, by means of embedding the images in data: URLs in the doc markdown, so that they need not be picked up by rustdoc itself.

I'll add it then.

GuillaumeGomez avatar Mar 06 '23 15:03 GuillaumeGomez

The Motivation section discusses things other than images: specifically, scripts. But then the actual proposed new feature only works for images and has a lot of image-specific special casing.

Can the RFC text please discuss how to handle scripts, or other resources that aren't images? There's some discussion of this in "Possible Extensions" but I think that approach is unlikely to happen soon.

IMO ideally there would be a way to tell rustdoc to "just copy this file into a known place in the output, preserving its leafname".

ijackson avatar Mar 22 '23 16:03 ijackson

It's mentioned in the motivations section because it's something that we could want into the future. However, the approach for handling non-image files would be very different since there is no standard markdown syntax for it, which would force us to first define this syntax. So instead of going all at once in here, I think it's much better to first land this so we can have the code ready in case we want to add support for other files than images.

But I prefer to warn you beforehand: including scripts comes with security concerns and that will be very complicated to go forward with it. I think it'll very likely never be approved. Anyway, discussion for another day. :)

GuillaumeGomez avatar Mar 22 '23 16:03 GuillaumeGomez

There weren't many comments so I guess we can start the last part of the process.

@rfcbot merge

GuillaumeGomez avatar Mar 27 '23 11:03 GuillaumeGomez

Team member @GuillaumeGomez has proposed to merge this. The next step is review by the rest of the tagged team members:

  • [x] @GuillaumeGomez
  • [ ] @Manishearth
  • [x] @Nemo157
  • [ ] @aDotInTheVoid
  • [ ] @camelid
  • [ ] @jsha
  • [x] @notriddle

Concerns:

  • Crate size (https://github.com/rust-lang/rfcs/pull/3397#issuecomment-1553461246)

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

rfcbot avatar Mar 27 '23 11:03 rfcbot

Thanks for working on this. I really support the goals - consistent reliable images and scripts across local, self-hosted, and docs.rs.

But I'm dissatisfied with the implementation plan for docs.rs: embedding images in the .crate file uploaded to crates.io. I think it unnecessarily bloats .crate files, which wastes bandwidth for crates.io, for end-users, and for CI platforms. Docs.rs downloads each crate once; the various other users of crates download them millions of times. We shouldn't include bytes that are only useful for documentation. Can we find a different solution?

When we first started talking about this issue I was not worried about crate size because I figured this would be mostly used for project icons, and they would be ~5kB of SVG. But thinking about it more I realize some crates will want larger images to illustrate complex concepts. Most doc authors are not experts in minimizing image files, and rustdoc won't have any way to automatically minimize them. And I'd really like to have a rigorous way to use scripts, too; and useful scripts like KaTeX are in the ~200kB range.

IIRC crates.io's limit on crate size is 50MB. In some sense that's good since it means there's plenty of room for doc resources. But on the other hand it's bad since it would allow making very big crate files filled up with doc resources. Of course it is currently allowed to make very big crates anyhow, but this feature would create more incentive to do so.

@rfcbot concern Crate size

The main constraints that are challenging:

  • For docs.rs, we would like the doc build process to not have network access. It should receive a .crate file as input on the filesytem, and output documentation on the filesystem.
  • Local / self-hosted doc builds sometimes include docs for dependencies. When building dependencies, rustdoc only "knows" about the .crate file for that dependency.

For the first problem, docs.rs would have to introduce another input. We could make that input predictable, and pre-fetch it to the filesystem before running the build rather than having the doc build process fetch it. I don't love having another input in docs.rs but I think it would be better than making packages bigger for all non-doc uses. We could reduce the operational problems by only fetching that alternate input from one specific host - like crates.io, or another host created for the purpose.

For the second problem, there would have to be a well-known place to fetch additional resources for a given .crate file.

For user experience, we would want to avoid doc authors needing to create separate credentials for uploading doc resources. That argues in favor of having a way to upload additional files to crates.io, specifically linked to a crate version and using the same credentials as the crate itself. I know this makes for a much more complex project. I'd love to find a better way but am having trouble thinking of one!

jsha avatar May 18 '23 18:05 jsha

Very interesting take! I think we'll need to have opinions from @rust-lang/crates-io and eventually @rust-lang/cargo to move forward on this. I think having two different kind of resources for crates.io is actually a good idea where the non-code files could use a more aggressive compression? But I don't think it's a good idea to require two downloads (one for code and one for doc resources).

GuillaumeGomez avatar May 18 '23 18:05 GuillaumeGomez

Yeah I agree with @jsha's concern

Might be worth postponing the RFC, and the interested parties can go talk to the other teams until they have a new proposal (because I don't see this proposal being able to fit that concern without effectively being changed into something rather different)

Manishearth avatar May 21 '23 15:05 Manishearth

I generally support the goals of this RFC, but I agree with the previous commenters that we would need some way to discourage including huge resources in the crate tarballs. If the images end up being significantly larger than the source files then it's questionable whether it is worth including them.

Regarding the "having two different kind of resources" idea: I would be strongly against that. It seems like a lot of complexity for a comparatively small benefit.

Turbo87 avatar Jun 02 '23 09:06 Turbo87

I talked with @epage and he mentioned that in the future, cargo could potentially store all the content of a crate directly into an archive-like file, which would be used as is directly. If so, it would very likely completely remove this concern. I think the best track here is to actually wait for such a possibility to become reality (if it ever become one) before moving forward with this RFC.

GuillaumeGomez avatar Jun 02 '23 10:06 GuillaumeGomez

For context, Bevy would really love to have this for our docs to demonstrate various rendering and mathematical concepts. That however means shipping dozens to hundreds of shiny full resolution images (scaling them down will mean misleading artifacts, and SVGs simply don't apply). We would hit the 50 MB limit rather quickly if we were using this feature as aggressively as I would like from a purely docs/learning perspective. I definitely think we should avoid shipping all of the images to every user of the crate somehow.

alice-i-cecile avatar May 13 '24 21:05 alice-i-cecile

An "interesting idea" that crates could apply could be to use a CDN front around GitHub raw file access (e.g. Statically, no association) and release version tags to get the correct revision of the asset, e.g. #[doc = concat!("![img](https://cdn.statically.io/gh/bevyengine/bevy/v", env!(CARGO_PKG_VERSION), "/assets/docs/Mesh.png")], probably with a helper macro so it's easier to spell, e.g. #[doc = doc_img!(alt="img" src="Mesh.png"). It perhaps is far from ideal during development (since you either get the previous version's assets or broken links) but it should theoretically work.

CAD97 avatar May 14 '24 02:05 CAD97

Another idea would be to make image downloads opt in: cargo add bevy --doc-images. When opening the documentation without having the images downloaded it could just display the alt text and some warning to actually download them.

dev-ardi avatar May 14 '24 10:05 dev-ardi

It should be possible to ship images as a separate optional dependency with this RFC, assuming rustdoc tracks spans for relative path references

# bevy/Cargo.toml

[features]
example-images = ["dep:bevy-example-images"]

[dependencies]
bevy-example-images = { optional = true, ... }
// bevy/lib.rs

/// Some feature
#[cfg_attr(feature = "example-images", doc = bevy_example_images::some_feature!())]
pub fn some_feature() {}
// bevy-example-images/lib.rs

#[macro_export]
macro_rules! some_feature {
    () => ("![showcase of some feature](./some_feature.png)")
}

(EDIT: Though, this relies on dependency source-code access which the RFC tries to avoid for re-export inlining.)

Nemo157 avatar May 14 '24 10:05 Nemo157