crates.io icon indicating copy to clipboard operation
crates.io copied to clipboard

Consider adding a configuration option for maximum metadata length

Open carols10cents opened this issue 6 years ago • 4 comments

Split off from https://github.com/rust-lang/crates.io/issues/1522. Relevant discussion:

Justin:

  • We should add a configuration entry for the global max size of the metadata (we currently use max tarball size several places)

Carol:

Do we really need a separate setting for this? Isn't max content length an effective maximum on metadata, because you could theoretically have a tarball that's 0 bytes and metadata that takes up the rest of the space? Just thinking in terms of what we would set this configuration option to if we had it!

I suppose that gets into how we want to resolve this comment and how we want to communicate this limit to someone whose crate is getting rejected who has a .crate file below our stated limits.

Justin:

I think it makes sense to add a separate config item here, but I don't know what we should set it to. Maybe we should add some logging to get a feel for some typical metadata sizes. The publish endpoint is fairly attractive from a DOS perspective, and the metadata could kick off a lot of database activity. I could potentially see a scenario where it would be nice to drop this limit quickly via an environment variable. Although I haven't really put much serious though into what such an attack might look like and if this would be an effective mitigation.

carols10cents avatar Nov 14 '19 18:11 carols10cents

Is there another issue to add the logging before this feature is implemented, or should I do a metadata size logging PR before a check and error response can be added?

smokytheangel0 avatar Nov 18 '19 17:11 smokytheangel0

Hi @smokytheangel0 ! There is not another issue for logging. Good catch, I mostly was concentrating on extracting the relevant info out of the other issue and I didn't really read the text I was copypasting very closely 😅

The logging is implemented here: https://github.com/rust-lang/crates.io/blob/2efe33df7183c2822e8b1b53e1090a510b6b6fce/src/middleware/log_request.rs and when you run cargo run --bin server and make requests locally, you'll see lots of stuff printed for local dev. The relevant lines that we log in production are the ones that start with at=info.

What we'd like to log is this metadata_length. To make that info accessible to the logger, I think the easiest way is, right after we get that metadata_length, add it to the req's extensions by calling mut_extensions().insert like we do with the User and then in the logger getting that value out again with req.extensions().find (you might need to wrap it in a custom type? it should be the only u64 in the extensions though)

Then only add something like metadata_length={} to the logged text if the value is present, because this is only relevant for publish requests.

Please let me know if you have any other questions or if anything I've suggested here doesn't work!

carols10cents avatar Nov 18 '19 17:11 carols10cents

completed pull request at https://github.com/rust-lang/crates.io/pull/1905 for adding metadata_length to logging

smokytheangel0 avatar Nov 18 '19 19:11 smokytheangel0

Bildschirmfoto 2023-09-22 um 15 12 50

this is the distribution of metadata_length values from the past 7 days. that outlier on the right is https://github.com/sagiegurari/cargo-make, which has a veeeery long readme, which is included in that metadata JSON blob.

axum, by default, uses a 2 MB limit on JSON request bodies. since this is essentially a JSON request body with an attached tarball body it seems reasonable to me to also use this 2 MB limit for the metadata part of the publish request body. this should be low enough to prevent attacks, but large enough to accommodate our regular publish requests.

Turbo87 avatar Sep 22 '23 13:09 Turbo87