delve icon indicating copy to clipboard operation
delve copied to clipboard

linux/riscv64 test failures

Open aarzilli opened this issue 1 year ago • 11 comments

The linux/riscv64 builder is not picking up linux/riscv64 builds. Looking at teamcity the problem seems to be this:

Unmet requirements: docker.server.version exists 

aarzilli avatar Oct 13 '24 05:10 aarzilli

cc @lrzlin

aarzilli avatar Oct 13 '24 05:10 aarzilli

According to https://github.com/lawrenceching/gitbook/blob/master/teamcity-is-reporting-unmet-requirements-docker.server.version-exists.md, I start the docker deamon, now it's up and running on the linux/riscv64 builder, and I could find the Linux-openEuler24.09-riscv64 in compatible agents. Maybe we could trigger the builds manually and see what happens?

lrzlin avatar Oct 13 '24 07:10 lrzlin

Alright. The agent is picking up builds however there are some failing tests: https://delve.teamcity.com/viewLog.html?buildId=63555&tab=buildResultsDiv&buildTypeId=Delve_linux_riscv64_1_23

  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script
  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

aarzilli avatar Oct 13 '24 08:10 aarzilli

  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script

This seems to be related with async preempt mechanism of go, I tried disable it in delve but doesn't help, loong64 used to have similar issue, they finally fixed this by making CLs to the go compiler itself, so maybe we should do the same.

  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

Yes, I found this should also related with the go compiler, I tried loong64's delve pr on official go1.23.2 and their custom go1.22.6, the custom 1.22.6 compiler could pass these prologue skipping tests but the 1.23.2 couldn't.

lrzlin avatar Oct 13 '24 18:10 lrzlin

  • TestTraceDirRecursion: this does not concern me much, we can either disable it on linux/riscv64 or figure out where the CGO_CFLAGS are coming from, or just unset them somewhere on the test script

This seems to be related with async preempt mechanism of go, I tried disable it in delve but doesn't help, loong64 used to have similar issue, they finally fixed this by making CLs to the go compiler itself, so maybe we should do the same.

We could also simply adjust the test to ignore that message.

  • All the other ones TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 have the same root cause, the prologue skipping code isn't working, i.e. probably the prologues set in riscv64_arch.go are wrong. This also reminds me that we should make a CL for the go compiler, so that it sets the prologue end marker correctly for riscv64.

Yes, I found this should also related with the go compiler, I tried loong64's delve pr on official go1.23.2 and their custom go1.22.6, the custom 1.22.6 compiler could pass these prologue skipping tests but the 1.23.2 couldn't.

It should be possible to make those tests pass without changing the compiler. I'd check the disassembler output for the functions where the detection is failing. As it is, if this test fails several other tests will fail sporadically.

aarzilli avatar Oct 13 '24 19:10 aarzilli

@aarzilli Hi, I found there are more tests failed after deteled the prologue related code, is that means at least it's partially correct?

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

lrzlin avatar Oct 15 '24 04:10 lrzlin

@aarzilli Hi, I found there are more tests failed after deteled the prologue related code, is that means at least it's partially correct?

It could be. I compared it to the code in stacksplit and to what its generated by the compiler and I didn't see any resemblance but maybe I missed something. But a bad prologue detection code is going to make many of the tests in our suite fail non-deterministically so I would have to take samples before and after to be confident just by looking at test failures.

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

aarzilli avatar Oct 15 '24 06:10 aarzilli

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

Add prologue_end DWARF stmt for riscv64: https://go-review.googlesource.com/c/go/+/620295

lrzlin avatar Oct 15 '24 07:10 lrzlin

It seems like the agent is now offline?

aarzilli avatar Oct 23 '24 08:10 aarzilli

It seems like the agent is now offline?

Sorry, the server was down due to maintainance, I've restart the teamcity agent.

lrzlin avatar Oct 23 '24 08:10 lrzlin

Besides, by adding the prologue_end dwarf in src/cmd/internal/obj/riscv/obj.go like ppc64 and loong64, and TestIssue332_Part2, TestSkipPrologue and TestSkipPrologue2 could pass. So if we could use this workaround instead of using the prologue detection code, I'll make a CL to golang as soon as possible.

The prologue end marker is not a workaround, it is the proper way to do it.

Add prologue_end DWARF stmt for riscv64: https://go-review.googlesource.com/c/go/+/620295

Merged.

lrzlin avatar Oct 25 '24 22:10 lrzlin

Closing, the riscv64 builder no longer exists.

aarzilli avatar Feb 28 '25 09:02 aarzilli