neon
neon copied to clipboard
Flakiness in test_sharding_split_smoke
Example: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6805/7957035402/index.html#suites/140824de6e814b5b1ae2b622c3f67840/6cd46f9911ed5b0f
In that run, the compute hook (local version, using neon_local Endpoint) is hanging, causing migration to time out.
Having fixed spurious reconciles, the logs are cleaner in this failure: https://neon-github-public-dev.s3.amazonaws.com/reports/pr-6828/7964818040/index.html#/testresult/dd6a70cde8671f8
In this example we see an error configuring compute node, but we're only giving it 2 seconds to complete because the earlier part of the migration (switching origin to stale mode) took 8 seconds. So it may be that we're just not giving it enough time, and the apparent failure of compute configuration is actually just the compute notification taking a while (although I'm still a bit concerned that the compute configuration in a neon_local environment isn't sub-second, it shouldn't be slow).
zero flakes of this in last 24h -- I suspect that #6814 made it much much rarer.
Recently it started to re-occur again:
edit: #7201