azure-service-operator icon indicating copy to clipboard operation
azure-service-operator copied to clipboard

Bug: Recorded tests with KeyVaults time out

Open theunrepentantgeek opened this issue 4 months ago • 1 comments

Describe the bug

We have several tests that include KeyVaults configured with purgeThenCreate for their creation mode.

The polling interval for the purge operation is set to 10s, with no way for the extension to distinguish between live, test-playback, or test-recording.

There's no problem when the purge is quick, as two or three polls will easily fit within the 120s timeout used for playback tests, but when purging takes considerable time (as I found recently with Test_AKS_ManagedCluster_20231001_CRUD, where the operation was polled 85 times), test replay fails because the operation does not complete (85 * 10s = ~14m).

Expected behavior

Ideally, we'd like polling operations in extensions to run faster during test playback, but it may be challenging to distinguish between live, test-playback, and test-recording.

Another solution would be to "reduce" the operation in the recordings, stripping out most of the "waiting" polls, retaining just the start and end.

To Reproduce

Rerecord the test Test_AKS_ManagedCluster_20231001_CRUD and then attempt playback.

theunrepentantgeek avatar Nov 27 '25 21:11 theunrepentantgeek

Possible fixes:

  • Could read retry speed from an environment variable
  • Could read it from a static variable location (similar to version)
  • Could shrink the number of recorded interactions (which we've discussed elsewhere). This doesn't actually make any individual try faster but it takes the 85 and turns it into like 3
  • Could pass it in context too (possibly?)
  • Could have a generic options/configuration struct that is passed to all extensions (possibly smuggled on ctx if we wanted or just as a plain parameter)

matthchr avatar Dec 01 '25 23:12 matthchr