redpanda icon indicating copy to clipboard operation
redpanda copied to clipboard

tests: make redpanda methods more thread-safe

Open andrwng opened this issue 2 years ago • 4 comments

Cover letter

Several methods that run for a particular node check that that node is started. This is somewhat reasonable as we typically expect to want to run get info for just the started nodes. However, this doesn't play well if calling any of these concurrently with a restart (removes from the list of started nodes, then re-adds).

This commit makes the methods more thread-safe by using the immutable full set of nodes, rather than just the started nodes in assertions.

Fixes #5827

Backport Required

  • [x] not a bug fix
  • [ ] papercut/not impactful enough to backport
  • [ ] v22.2.x
  • [ ] v22.1.x
  • [ ] v21.11.x

Release notes

andrwng avatar Aug 04 '22 05:08 andrwng

CI failure is https://github.com/redpanda-data/redpanda/issues/5858

andrwng avatar Aug 04 '22 23:08 andrwng

I have a question isn't the assertion in redpanda.py:394 incorrect, it seems to me that even if node is stopped we should be able to access its address ?

mmaslankaprv avatar Aug 05 '22 06:08 mmaslankaprv

I have a question isn't the assertion in redpanda.py:394 incorrect, it seems to me that even if node is stopped we should be able to access its address ?

Ah yea that's a good point. The calls should work regardless of whether they're started. I'll update this so it just checks that the node is in the Redpanda cluster.

andrwng avatar Aug 05 '22 15:08 andrwng

I updated the ones that seem like they should work. We do lose out on some developer-error-catching if we are unintentionally calling these methods with stopped nodes, but I think the calls should still return the correct thing.

andrwng avatar Aug 05 '22 18:08 andrwng

sorry @andrwng this got lost in the shuffle. looks like there is conflict

dotnwat avatar Aug 17 '22 23:08 dotnwat

@dotnwat Thanks for the look! Fixed

andrwng avatar Aug 17 '22 23:08 andrwng

@andrwng please report the CI failures here

dotnwat avatar Aug 24 '22 19:08 dotnwat

Thanks for the reminder, been retrying on what appear to be unrelated issues, but wanted at least a clean run since this touches some widely used methods:

  • 8/18: https://github.com/redpanda-data/redpanda/issues/5884
  • 8/19: assertion that was removed by https://github.com/redpanda-data/redpanda/commit/62c2ea43c5723d6b160e9da234c7f872555c54f8
  • 8/24: out of space issues, https://github.com/redpanda-data/vtools/issues/783 seems to remain at large

andrwng avatar Aug 24 '22 19:08 andrwng