cargo icon indicating copy to clipboard operation
cargo copied to clipboard

publish multiple crates, races with dependencies

Open ijackson opened this issue 4 years ago • 13 comments

Problem

My project has multiple crates in a cargo workspace. I use a shell loop to run cargo publish on each of the crates in dependency order.

I get output like this:

+ cargo publish --no-verify
    Updating crates.io index
   Packaging otter-api-tests v0.6.0 (/volatile/rustcargo/Rustup/Game/server/apitest)
   Uploading otter-api-tests v0.6.0 (/volatile/rustcargo/Rustup/Game/server/apitest)
nailing-cargo: finished.  status 0.
+ nailing-cargo --no-nail --linkfarm=git --- sh -xec 
          find . ! -type l ! -type d ! -path './target/*' -print0               | xargs -0r rm --
          cd wdriver; cargo publish --no-verify 
        
nailing-cargo: out-of-tree, building in: `/home/ian/Rustup/Game/Build/server'
nailing-cargo: using really to run as user `rustcargo'
nailing-cargo: invoking: sh -xec 
          find . ! -type l ! -type d ! -path './target/*' -print0               | xargs -0r rm --
          cd wdriver; cargo publish --no-verify 
        
+ find . ! -type l ! -type d ! -path ./target/* -print0
+ xargs -0r rm --
+ cd wdriver
+ cargo publish --no-verify
    Updating crates.io index
   Packaging otter-webdriver-tests v0.6.0 (/volatile/rustcargo/Rustup/Game/server/wdriver)
error: failed to prepare local package for uploading

Caused by:
  failed to select a version for the requirement `otter-api-tests = "=0.6.0"`
  candidate versions found which didn't match: 0.5.1
  location searched: crates.io index
  required by package `otter-webdriver-tests v0.6.0 (/volatile/rustcargo/Rustup/Game/server/wdriver)`

As you can see, cargo publish for otter-api-tests v0.6.0 succeeded. But cargo publish for otter-webdriver-tests then failed, complaining that otter-api-tests = "=0.6.0" could not be resolved.

I tried working around this with sleep 10. That wasn't enough.

I looked in cargo help publish to see if there was a way to turn off the dependency check, but I didn't see one. I also looked at various commands to see if I could perhaps poll the registry, but nothing seemed relevant. cargo search seems too dwim, even with postprocessing of the ouptut, and I am not convinced (from my experiences with publishing otter 0.5.1) that this would necessarily work.

Steps

  1. Make two crates, A and B, where B depends on the precise version of A.
  2. cd A && cargo publish && cd ../B && cargo publish

Possible Solution(s)

Ideally whatever service cargo publish talks to wouldn't say "yes" until it had done it. Or maybe cargo publish could at least have an option to wait.

Notes

Output of cargo version:

cargo 1.54.0-nightly (e51522ab3 2021-05-07)

I am slightly perplexed since I don't understand how large Rust projects with multiple crates do their publish.

ijackson avatar May 23 '21 17:05 ijackson

Sorry for the inconvenience!

Ideally whatever service cargo publish talks to wouldn't say "yes" until it had done it. Or maybe cargo publish could at least have an option to wait.

That is how things used to work. Unfortunately this led to a lot of problems as it would often time out the connection. Leading to things getting left in an inconsistent state. There was a hole plan for making a better publish API, but I think this one part is the only thing that got done. A follow up part was to have a way to poll if the publish had gone thru, but I don't thing it has bean made yet. You can probably curl https://raw.githubusercontent.com/rust-lang/crates.io-index/master/ot/te/otter-api-tests to see if it has completed, as a hack. I think and option to have cargo check it self and not return until it sense it end up in the index, would make since, if you want to push that forward.

Eh2406 avatar May 23 '21 19:05 Eh2406

Sorry for the inconvenience!

Thanks for the helpful reply! It makes a lot of difference to know I'm not going mad, and this is actually a real problem rather than me just having got the wrong end of several sticks :-).

You can probably curl https://raw.githubusercontent.com/rust-lang/crates.io-index/master/ot/te/otter-api-tests to see if it has completed, as a hack.

Thanks. That is a really helpful suggestion, which I will adopt.

I think [an] option to have cargo check it self and not return until it [sees] it end up in the index, would make [sence], if you want to push that forward.

I think that would be great. I'm not sure I will get around to trying to get stuck into cargo to add that soon, so in the meantime thanks for the workaround!

ijackson avatar May 23 '21 21:05 ijackson

Yea, unfortunately this is a known issue with background updates (more information at https://internals.rust-lang.org/t/changes-to-how-crates-io-handles-index-updates/9608). I'm surprised that a 10second sleep wasn't enough, as I thought the delay was usually quite short. Perhaps there was a temporary buildup in the queue or some kind of outage?

Some of the coordination for a new publish API is at https://github.com/rust-lang/crates-io-cargo-teams/issues/82, where hopefully issues like this would be resolved.

We could potentially put a temporary fix in Cargo to simply sleep if resolution fails and try again. It is done here.

ehuss avatar May 24 '21 15:05 ehuss

Well, thanks for the curl trick. It does seem to work. (Shell fragment below the cut for your, err, edification.)

I observe, though, that raw.githubusercontent.com is a CDN. It seems to have some lag. In the upload of my first crate for this release cycle, my sleep-retry loop waited a total of 322 seconds. The last sleep was 40s so it took between 282 and 322 seconds. 5 minutes would be 300s and is the kind of number people choose as a cache timeout.

As I'm writing this the 3rd crate is waiting, after another similar delay for the 2nd. I guess the whole thing is going to take about half an hour although I'm optimistic that it will, in the end, complete successfully.

I also noticed that viewiing the URL from another network location gave different answers.

# https://github.com/rust-lang/cargo/issues/9507
wait_for_crates_io () {
    local p=$1
    local delay=1
    local url="$cratesio_raw_url/${p:0:2}/${p:2:2}/$p"
    printf >&2 "waiting for upload of %s to take effect" "$p"
    while sleep $delay; do
	printf >&2 .
	local got=$(
	    curl -sS "$url" | jq '.vers | select(. == "'"$version"'")'
	)
	if [ "x$got" != x ]; then break; fi
	delay=$(( $delay * 11 / 10 + 1 ))
    done
    echo >&2 'done\n'
}
(seems to produce some noise on stderr for a completely new crate, but does work in the end...)

ijackson avatar Jun 08 '21 20:06 ijackson

cargo-release and cargo-smart-release support workspace publishes. In the mean time, I'd recommend using them (cargo-release focuses on adapting to many workflows while cargo-smart-release focuses on making it very smooth to release a specific subset of workflows).

The way they implement this is doing registry updates until the published crate is found in the registry, with a sleep inbetween. We could make a PR for cargo to do similar (been meaning to do it since we solved this but never got around to it). I think the only question I've had is if we should always wait for publish or if there should be a flag for it. @ehuss, any thoughts, including how specific we name the flag if we go that route?

As for sleeps, cargo-release had used fixed sleeps several times as we worked to get the polling working and we just never found a good short value. We had some but there was always a user who seemed to need it longer and we just allowed people to set their own with an env variable.

epage avatar Feb 28 '22 14:02 epage

doing registry updates until the published crate is found in the registry, with a sleep inbetween.

I have a shell script to do precisely this.

We could make a PR for cargo to do similar (been meaning to do it since we solved this but never got around to it). I think the only question I've had is if we should always wait for publish or if there should be a flag for it.

I definitely think this feature should be in cargo. Probably there should be a way to turn it off. Having it off by defaul tin at least some situations is probably a wise conservative choice. This leads to the conclusion that there ought to be a command line option to control it.

The only remaining question is what the default should be. The cautious choice would be to make the default be to not wait. I think that would be an improvement and ought not to be blocked on stability questions. Changing the default later would be fine I think.

Maybe in the end the right default is to wait iff a workspace is in use.

ijackson avatar May 06 '22 15:05 ijackson

Personally, I'm fine with cargo always waiting. cargo publish used to always wait but stopped when crates.io changed its behavior.

If its configurable, I personally see waiting as a safer default because people might automate around publish without realizing they need to wait but then every once in a while it will fail in a non-reproducible way.

epage avatar May 06 '22 15:05 epage

Looking into how to automate testing of a fix.

It looks like the current registry tests use a file URL for the registry which causes everything to be immediately available.

There is cargo_test_support::registry::serve_registry for proxying that maps HTTP Gets to files. This seems like the best candidate for testing. We'd need to add Put support so we can handle the publish operation. The main question is how to delay the publish so it doesn't immediately show up and how to coordinate that with the test

epage avatar May 24 '22 19:05 epage

Here are my two cents: perhaps whatever crates.io API call is used for publishing should respond success only after the crate is actually available. Otherwise, it's more like "yeah, I have your crate. It'll be available soon...".

mightyiam avatar Jun 03 '22 16:06 mightyiam

Before, crates.io did block until it was published as the calls were being serialized. They switched to the API call enqueuing the publish. The concern I'd have with crates.io transparently blocking is if a timeout gets hit. Before, it most likely meant your request didn't happen. Now, it could mean either your request wasn't enqueued or that it hasn't processed your crate yet. I feel like a more explicit handshake for would be ideal.

epage avatar Jun 03 '22 17:06 epage

Here's a bash script we use: https://github.com/mobusoperandi/michie/blob/9a190b78a5ef540c261f2af0ca5a820e18910746/Makefile.toml#L107-L124.

mightyiam avatar Jun 07 '22 16:06 mightyiam

Testing notes

@_Eh2406|120179 said:

@Ed Page let's take as a given that we want to leave the logic for publish to happen asynchronously and as direct rights to the file system. Here's a way to get the behavior tested:

  1. Generate the index files and the crate files (direct on the file system) by calling .publish()
  2. Use api_responders to override "api/v1/crates/new" to set a flag.
  3. Use api_responders to override "index/..." to return 404 until the second time it is hit after the flag is set.
  4. Call cargo publish

epage avatar Jul 21 '22 15:07 epage

In creating #11062, we highlighted that cargo's tests use a mock registry that doesn't complete the publish and causes timeouts.

We plan to fixup the mock registry so it will work but this helped call attention to what might happen if a registry has special needs

Quoting @Eh2406 in https://github.com/rust-lang/cargo/pull/11062#issuecomment-1242873601

We should also keep in mind in designing this that not all registries necessarily publish promptly. it is perfectly reasonable for a registry that accepts the publish API to require some kind of intervention before it appears in the index. I could see a registry requiring a user to review the package they just uploaded to verify that it looks correct before adding it to the index. A registry could also do an automated or manual security audit before it is added.

Probably the ideal solutions to this are

  • Make it configurable in a way tied to the registry for end-users to resolve needs. We could combine enable/disable with the timeout by having 0 mean "don't even check"
  • Allow the registry to tell cargo what the expected behavior is

Again, quoting @Eh2406 but in https://github.com/rust-lang/cargo/pull/11062#issuecomment-1245645198

We should also keep in mind in designing this that not all registries necessarily publish promptly.

To start speculating on an RFC... Thinking about this more, I think the right place for a registry to tell us how long to wait is in the response to the /api/v1/crates/new call. Either in the response object or in headers. I see a couple of reasonable things a registry could want to tell a user about how long to wait:

  • Retry for some expected period of time (the behavior you're adding here) possibly with the ability for a registry to estimate how long the retries should run for.

  • A URL that needs to be interacted with in order for progress to be made. For the cases where it requires the author to do a manual review.

  • A URL that can be used to track the progress of publication. (Possibly with some specification for how cargo can use it as a retry loop.)

  • Clarification that the retry loop is unlikely to be productive. Say if all publishes require manual review by the registry.

The idea even came up for us to do a phased stablization by having this start as opt-in and move to opt-out

However,

  • This requires the configuration to be stable from the start which requires resolving things lie naming
  • We'd then have to ride this phased stablization plan on the timetable of rust's release schedule
  • This is a blocker for making sparse registry the default because of the effect of the extra latency (it is being reduced compared to nightly today)
  • While it isn't a blocker for stablizing sparse registry, users will likely opt-in without awareness of consequences and run into issues or those who do wait will be blocked until this is resolved
  • Having the workaround be unstable is a (negative) incentive for people to give us feedback to help shape the long term plan (or even know if its worth it)
  • We expect the scope of people being affect to be a fraction of a fraction and they can work around it by hitting ctrl-c (if their process is automated, then they likely need this change)

As an alternative, our plan is

  • Provide a low-effort unstable workaround
  • Intentionally miss the cut to beta next week so we get a full 12 weeks of testing, 6 on nightly and 6 on beta
  • Put this in the TWiR testing section

epage avatar Sep 13 '22 18:09 epage

From @ehuss in #11062

I was curious why this waits after publish, instead of waiting for dependencies to be available? The original intent I had for #9507 is to only block publishing to wait for dependencies to become available. That way, it doesn't affect the majority of cases where someone is publishing a single package. Do you have any thoughts on those approaches?

I'm a little concerned about always blocking until it is available. There are times where there can be a significant delay, and the user may not care, but would be bothered if publishing gets "stuck".

I can see some benefit of blocking before finishing, such as someone wanting to publish an announcement. But I'm a bit wary of forcing that behavior.

(moved here as this ties into a lot of other pertinent comments)

epage avatar Sep 21 '22 20:09 epage

I was curious why this waits after publish, instead of waiting for dependencies to be available?

Can we get much better than broadly swallowing errors and assuming a specific error case, polling for it to be resolved? If not, that feels too brittle for me. If there is an unrelated error, the user is blocked until the timeout. The user has no alternative, like ctrl-c, as that will kill the whole publish.

We also know that the block-on-publish tends to work for people as that is how cargo+crates.io used to behave.

It also seems like this leaves the door open for other race conditions. You mentioned announcements (I actually tend to wait on docs.rs so that is moot for me). There is also the fact that the delay with git registries is already high enough that at times I could publish in one project and then try to consume it in another project just to see it fail. I feel like telling users when it is available would give a better experience.

I'm a little concerned about always blocking until it is available. There are times where there can be a significant delay, and the user may not care, but would be bothered if publishing gets "stuck".

We've tried to wordsmith the message to let users know its safe to ctrl-c if they wish.

But I'm a bit wary of forcing that behavior.

  • Making blocking opt-in is most likely to lead to user error and frustration
  • Making it opt-out on the CLI seems to have negligible benefit as the user has to think a priori "I don't care about waiting for this" which is unlikely to happen
  • Making it opt-out in config is probably the ideal solution (see my earlier comment) but the question is if it is worth blocking any improvement on figuring out all of the details on this. Our plan from the cargo team meeting was to use the 12 weeks for release and TWiR testing section to provide opportunities for people to run into this and provide input. We'd have an unstable escape hatch to not fully block people
  • As I said, I worry that guessing when to do this will be too brittle and instead cause other, worse problems

epage avatar Sep 21 '22 20:09 epage

Request for Testers

Background: Originally, cargo publish would block until crates.io finalized everything and made the crate was available for use. This worked because the crates.io finalized the publish synchronously with the publish request. crates.io was later changed where the publish request enqueued the crate and cargo publish would complete immediately. This makes it more difficult for publishing multiple crates in a row that depend on each other and increases the chance for failure when publishing one crate and then immediately using the result in another crate. While the time from being posted to the registry to being readable in the index can vary, one day I I saw it consistently taking 7 seconds.

The problems with asynchronous publish are expected to be exacerbated with the soon-to-be-stablized sparse registry (#11224).

  • The delays is expected to be worse
  • Any tool accounting for the delay that is written against the git registry, like cargo-release, will likely fail when the user has the sparse registry enabled as there will be a delay between when cargo-release will think the crate is accessible (visible in git index) and when it will actually be accessible (visible in sparse index).

Behavior being tested: In #11062, cargo publish will update the index in a loop until the published crate is available, restoring the original user-visible behavior on the client-side (cargo publish). At the moment, cargo publish polls every 1 second and will timeout after 60 seconds with a warning. This was merged insta-stable and is expected to be included in 1.66.

Focus Areas

  • Alternative registry users
  • Non-standard cargo publish workflows

Workaround: You can disable polling by setting the nightly-only publish.timeout config.toml field.

epage avatar Oct 30 '22 01:10 epage

I assume we need to remove the check from cargo-release and cargo-workspaces when sparse registry is stabilized, right?

pksunkara avatar Nov 04 '22 09:11 pksunkara

No reason to remove it immediately. Crates.io will remain available via both git and sparse APIs. Having the code around will help for people who have not upgraded cargo yet.

epage avatar Nov 04 '22 14:11 epage