cortex icon indicating copy to clipboard operation
cortex copied to clipboard

Add support for exclude_alerts flag in ListRules API

Open rajagopalanand opened this issue 1 year ago • 5 comments

Which issue(s) this PR fixes: Fixes #5989

Checklist

  • [x] Tests updated
  • [ ] Documentation added
  • [x] CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

rajagopalanand avatar Jun 12 '24 20:06 rajagopalanand

  • no unit tests?
  • The title of the PR is not helpful. It should indicate what the PR is doing instead of a link to a issue. I can find the link to the issue in the body of the PR if I needed.

rapphil avatar Jun 12 '24 20:06 rapphil

Will add tests

rajagopalanand avatar Jun 13 '24 12:06 rajagopalanand

I think the test is flaky. Can we fix them?

--- FAIL: TestGetRules (10.23s)
    --- FAIL: TestGetRules/No_Sharding_with_Rule_Type_Filter (0.24s)
        ruler_test.go:362: 
            	Error Trace:	/__w/cortex/cortex/pkg/ruler/ruler_test.go:362
            	            				/__w/cortex/cortex/pkg/ruler/ruler_test.go:922
            	            				/__w/cortex/cortex/pkg/ruler/ruler_test.go:872
            	            				/__w/cortex/cortex/pkg/ruler/ruler_test.go:904
            	Error:      	"0" is not greater than or equal to "1"
            	Test:       	TestGetRules/No_Sharding_with_Rule_Type_Filter

yeya24 avatar Jun 18 '24 23:06 yeya24

Previously the Ruler tests did not look for alerts firing. With this new filter, the tests look for active alerts and this requires extra time for the Ruler to load the rules and evaluate the rule groups at least twice. In unit tests, there is not an obvious way to know if a Ruler has started evaluating rule groups. Integration tests call the metrics endpoint to know if a particular rule group has been evaluated. I could cast the manager interface to the concrete DefaultMultiTenantManager and access user registries. For now, I added some extra sleep time to allow Rulers to evaluate the rule groups

rajagopalanand avatar Jun 19 '24 13:06 rajagopalanand

In addition, TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy is failing every once in a while. Here is an example of a test failure. I ran the tests with some print statements locally and with enough # of runs, the error occurred on my local machine

instance-64 zone-1 zone-2 instance-44
    --- FAIL: TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy/num_instances_=_60,_num_zones_=_3,_stable_sharding_=_false (0.03s)
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
	panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x1bd7f83]

goroutine 69 [running]:
testing.tRunner.func1.2({0x1d9bfa0, 0x2d156f0})
	/usr/local/Cellar/go/1.21.3/libexec/src/testing/testing.go:1545 +0x238
testing.tRunner.func1()
	/usr/local/Cellar/go/1.21.3/libexec/src/testing/testing.go:1548 +0x397
panic({0x1d9bfa0?, 0x2d156f0?})
	/usr/local/Cellar/go/1.21.3/libexec/src/runtime/panic.go:914 +0x21f
github.com/cortexproject/cortex/pkg/ring.(*MinimizeSpreadTokenGenerator).GenerateTokens(0xc00059ca90, 0xc0001ef160, {0xc0004f4090, 0xb}, {0xc0004f40a0, 0x6}, 0x80, 0x1)
	/Users/anrajag/workplace/rajagopalanand-cortex/pkg/ring/token_generator.go:134 +0x883
github.com/cortexproject/cortex/pkg/ring.TestRing_ShuffleShardWithLookback_CorrectnessWithFuzzy.func1(0xc000503040)
	/Users/anrajag/workplace/rajagopalanand-cortex/pkg/ring/ring_test.go:2497 +0xce6
testing.tRunner(0xc000503040, 0xc00070ec60)
	/usr/local/Cellar/go/1.21.3/libexec/src/testing/testing.go:1595 +0xff
created by testing.(*T).Run in goroutine 11
	/usr/local/Cellar/go/1.21.3/libexec/src/testing/testing.go:1648 +0x3ad

The above seg fault lines #s do not match because I added some log statements. Not sure if this a problem with the test or if there is an edge case bug in MinimizeSpreadTokenGenerator.GenerateTokens

rajagopalanand avatar Jun 19 '24 16:06 rajagopalanand

It would be great to revisit this PR and get it merged

yeya24 avatar Jul 16 '24 17:07 yeya24

Yes, will address it this week

rajagopalanand avatar Jul 16 '24 19:07 rajagopalanand

Added integ tests and removed unit tests

rajagopalanand avatar Jul 30 '24 16:07 rajagopalanand

LGTM!

alanprot avatar Jul 30 '24 17:07 alanprot