sdk icon indicating copy to clipboard operation
sdk copied to clipboard

Non-deterministic snapshots on windows x64

Open athomas opened this issue 1 year ago • 3 comments

https://ci.chromium.org/ui/p/dart-internal/builders/ci/dart-sdk-win-beta/165/overview exposed non-deterministic Windows x64 snapshots (strangely, ia32 snapshots seemed to be deterministic).

Non-deterministic snapshots:

  • bin/snapshots/analysis_server.dart.snapshot
  • bin/snapshots/dart2js.dart.snapshot
  • bin/snapshots/dartdevc.dart.snapshot
  • bin/snapshots/kernel-service.dart.snapshot

https://chrome-infra-packages.appspot.com/p/flutter/dart-sdk/windows-amd64/+/git_revision:d916a5f69a486de98316900f19ef0ff46834b03d https://storage.cloud.google.com/dart-archive/channels/beta/raw/hash/d916a5f69a486de98316900f19ef0ff46834b03d/sdk/dartsdk-windows-x64-release.zip

@rmacnak-google any ideas?

athomas avatar Oct 11 '24 08:10 athomas

Also, it's not all snapshots in the SDK, just some.

athomas avatar Oct 11 '24 08:10 athomas

IA32 is special because it doesn't support AppJIT (for the same reason it doesn't support AOT: our IA32 code isn't relocatable). So this is probably non-determinism in the AppJIT training. It could be an issue in the VM, or it could be that the training programs are non-deterministic.

rmacnak-google avatar Oct 14 '24 16:10 rmacnak-google

@athomas how critical is it for these snapshots to be deterministic? The training run variation potentially leads to this non determinism, doing a training run with just --help might fix it but that would not be ideal. We are also working towards switching all these snapshots to AOT snapshots and maybe that is the right fix for this.

a-siva avatar Oct 16 '24 20:10 a-siva

If we think we'll have the AOT snapshots in a reasonable timeframe, then I'd rather we go for that. I don't know how frequently this will still happen in the release process (I implemented some retries to mitigate this failure mode) and there is a workaround (bump the version, create a new release).

athomas avatar Oct 21 '24 09:10 athomas

This reproduces on Linux.

rmacnak-google avatar Oct 22 '24 17:10 rmacnak-google

Now I only observe non-determinism for the analysis server snapshot. I see that during its training run, it uses timers, which would cause non-determinism.

rmacnak-google avatar Oct 22 '24 23:10 rmacnak-google

Now that the analysis server is moving (or has moved) to an AOT snapshot, is this still an issue?

bwilkerson avatar Aug 01 '25 18:08 bwilkerson

Based on the lack of response, I'm going to close this as completed. If there's still work to do here please re-open the issue.

bwilkerson avatar Aug 06 '25 20:08 bwilkerson