Non-deterministic snapshots on windows x64
https://ci.chromium.org/ui/p/dart-internal/builders/ci/dart-sdk-win-beta/165/overview exposed non-deterministic Windows x64 snapshots (strangely, ia32 snapshots seemed to be deterministic).
Non-deterministic snapshots:
- bin/snapshots/analysis_server.dart.snapshot
- bin/snapshots/dart2js.dart.snapshot
- bin/snapshots/dartdevc.dart.snapshot
- bin/snapshots/kernel-service.dart.snapshot
https://chrome-infra-packages.appspot.com/p/flutter/dart-sdk/windows-amd64/+/git_revision:d916a5f69a486de98316900f19ef0ff46834b03d https://storage.cloud.google.com/dart-archive/channels/beta/raw/hash/d916a5f69a486de98316900f19ef0ff46834b03d/sdk/dartsdk-windows-x64-release.zip
@rmacnak-google any ideas?
Also, it's not all snapshots in the SDK, just some.
IA32 is special because it doesn't support AppJIT (for the same reason it doesn't support AOT: our IA32 code isn't relocatable). So this is probably non-determinism in the AppJIT training. It could be an issue in the VM, or it could be that the training programs are non-deterministic.
@athomas how critical is it for these snapshots to be deterministic? The training run variation potentially leads to this non determinism, doing a training run with just --help might fix it but that would not be ideal.
We are also working towards switching all these snapshots to AOT snapshots and maybe that is the right fix for this.
If we think we'll have the AOT snapshots in a reasonable timeframe, then I'd rather we go for that. I don't know how frequently this will still happen in the release process (I implemented some retries to mitigate this failure mode) and there is a workaround (bump the version, create a new release).
This reproduces on Linux.
Now I only observe non-determinism for the analysis server snapshot. I see that during its training run, it uses timers, which would cause non-determinism.
Now that the analysis server is moving (or has moved) to an AOT snapshot, is this still an issue?
Based on the lack of response, I'm going to close this as completed. If there's still work to do here please re-open the issue.