client icon indicating copy to clipboard operation
client copied to clipboard

Flaky E2E test: e2e/TestDomain - create domain with TLS

Open dsimansk opened this issue 2 years ago • 9 comments

The e2e tests are intermittently failing on e2e/TestDomain. In a specific https test case. When the URL should contain https but is empty. It's most likely a timing issue with empty value being checked prematurely.

There's a wait loop to wait for URL to be populated before asserting it. It might be not working as expected.
https://github.com/knative/client/blob/2d0415a7951b99130ccb514fa0e3c5343338225c/test/e2e/domain_mapping_test.go#L136C1-L144C3

Error log: https://prow.knative.dev/view/gs/knative-prow/logs/continuous_client_main_periodic/1710944161479266304#

/kind good-first-issue

dsimansk avatar Oct 09 '23 12:10 dsimansk

/assign

xiangpingjiang avatar Oct 25 '23 16:10 xiangpingjiang

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.

@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

dsimansk avatar Nov 23 '23 12:11 dsimansk

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti.

@xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@dsimansk Did you find that e2e/TestDomain not pass after https://github.com/knative/client/pull/1888 merged?

xiangpingjiang avatar Nov 23 '23 16:11 xiangpingjiang

The original issue is not fixed. We need to further investigate & verify the wait loop is working as expected and maybe fix ti. @xiangpingjiang I'll remove your previous assignment. But feel free to assign again and do a second pass if you are still interested.

@dsimansk Did you find that e2e/TestDomain not pass after #1888 merged?

Yes. E.g.: https://prow.knative.dev/view/gs/knative-prow/pr-logs/pull/knative_client/1901/integration-tests-latest-release_client_main/1729841947746504704

dsimansk avatar Nov 29 '23 13:11 dsimansk

/assign

xiangpingjiang avatar Nov 29 '23 15:11 xiangpingjiang

The suspicious part in test log is the following. See age: field is 1s, URL is still empty, but the test is already failed with missing string error.

There's still possibility to a short timeout (e.g. 3 - 5s) between create and verify operations. But I'd rather check first if the wait loop is working as expected.

domain_mapping_test.go:153: assertion failed: 
        Actual output: Name:       newdomain.com
        Namespace:  kne2etests21
        Age:        1s
        
        URL:  
        
        Reference:     
          APIVersion:  serving.knative.dev/v1
          Kind:        Service
          Name:        hello
        
        Conditions:  
          OK TYPE    AGE REASON
        
        Missing strings: https://newdomain.com/

dsimansk avatar Nov 30 '23 09:11 dsimansk

It could be a timing related issue. This may work better for the for loop

func domainDescribe(r *test.KnRunResultCollector, domainName string, tls bool) {
	k := test.NewKubectl(r.KnTest().Kn().Namespace())

	// Wait for Domain Mapping URL to be populated
	var url string
	timeout := time.After(30 * time.Second) // Adjust the timeout as needed

	for {
		select {
		case <-time.After(time.Second):
			out, err := k.Run("get", "domainmapping", domainName, "-o=jsonpath='{.status.url}'")
			assert.NilError(r.T(), err)

			if len(out) > 0 {
				url = out
				break
			}
		case <-timeout:
			assert.Fail(r.T(), "Timed out waiting for the domain mapping URL to be populated")
			return
		}
	}

	out := r.KnTest().Kn().Run("domain", "describe", domainName)
	r.AssertNoError(out)

	if tls {
		url = "https://" + domainName
	} else {
		url = "http://" + domainName
	}

	assert.Assert(r.T(), util.ContainsAll(out.Stdout, "Name", "Namespace", "URL", "Service", url))
}

By introducing this more robust waiting mechanism, you can mitigate timing issues and reduce the likelihood of intermittent test failures.

Ipppoooo avatar Dec 27 '23 20:12 Ipppoooo

Thanks @Ipppoooo , i think its worth a try. Maybe don't retry every second as this will cause 30 calls if this is a non-recoverable error. Maybe use e.g. 5seconds or an exponential backoff. Fancy to send in a PR ?

rhuss avatar Jan 10 '24 09:01 rhuss

hello @rhuss I think the reason maybe the break condition in for loop is len(out) > 0, but the assert condition is util.ContainsAll(out.Stdout, "Name", "Namespace", "URL", "Service", url).

It's different conditions.

What do you think?

xiangpingjiang avatar Jan 12 '24 06:01 xiangpingjiang