flake: TestWorkspaceDeletionLeak
Describe the bug
The test TestWorkspaceDeletionLeak has been flaking:
I was aware that this could happen when I implemented the test and had planned for a workaround, but that would require the maintainers of goleak to merge an open PR, details are here: https://github.com/kcp-dev/kcp/pull/3491#discussion_r2226127866
Instead the test now uses require.EventuallyWithT (kcptestinghelpers.Eventually would always immediately fail for some reason) but it seems the 30s are not enough:
I0807 09:17:51.683088 39314 namespace_controller.go:194] "Namespace has been deleted" component="kcp" postStartHook="kcp-start-controllers" namespace="yef8oaknnwv5ohao|default"
{"level":"warn","ts":"2025-08-07T09:18:05.175445Z","caller":"fileutil/purge.go:80","msg":"failed to lock file","path":"/tmp/TestWorkspaceDeletionLeak3304689176/002/artifacts/etcd-server/member/wal/0000000000000000-0000000000000000.wal","error":"fileutil: file already locked"}
leak_test.go:99: found leaking goroutines: ...
leak_test.go:99:
Error Trace: /home/prow/go/src/github.com/kcp-dev/kcp/test/integration/workspace/leak_test.go:99
Error: Condition never satisfied
Test: TestWorkspaceDeletionLeak
Messages: eventually there will be no random goroutines running while checking for leaks
I0807 09:18:20.286940 39314 dynamic_serving_content.go:195] "Failed to remove file watch,
It's also not possible to just shut down the KCP server because that could hide potential leaks.
Just ignoring any goroutines that have to do with http requests also has the potential to hide leaks, e.g. if an uncontexted http request is sent that runs for a long time.
Steps To Reproduce
- Make a PR
- Wait for the test to fail randomly
- If it doesn't retrigger until it does: https://prow.kcp.k8c.io/?job=pull-kcp-test-integration
Expected Behaviour
The test should not flake
Additional Context
No response
/kind flake
https://s3.eu-west-1.amazonaws.com/prow-public-data/pr-logs/pull/kcp-dev_kcp/3565/pull-kcp-test-integration/1962839491542519808/build-log.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=AKIAUXHT7IH25XHMMYM5%2F20250902%2Feu-west-1%2Fs3%2Faws4_request&X-Amz-Date=20250902T115100Z&X-Amz-Expires=600&X-Amz-SignedHeaders=host&X-Amz-Signature=cb52e9f180c03110c2ef1cab917bd84ced89f1f5028c896b92018872c861df3d
Issues go stale after 90d of inactivity.
After a furter 30 days, they will turn rotten.
Mark the issue as fresh with /remove-lifecycle stale.
If this issue is safe to close now please do so with /close.
/lifecycle stale
/remove-lifecycle stale