go icon indicating copy to clipboard operation
go copied to clipboard

cmd/asm: doesn't support PCALIGN on i386/amd64

Open piotr-sneller opened this issue 1 year ago • 1 comments

What version of Go are you using (go version)?

go version go1.19.2 windows/amd64

Does this issue reproduce with the latest release?

yes

What did you do?

Tried to compile the following assembly function:

TEXT test(SB), NOFRAME|NOSPLIT, $0-0 PCALIGN $2 RET

What did you expect to see?

I expected the function to compile correctly with the RET aligned to a 2-byte boundary.

What did you see instead?

asm: asmins: missing op 00000 (<unknown line number>)   PCALIGN $2
asm: assembly failed

Per my google research, it appears the directive PCALIGN is recognized by the assembler, but it is not implemented on the x86/x64 targets. Could you please implement this feature, as it is essential to high-performance assembly code? It should be working both with the TEXT (function/label alignment) and DATA (e.g. AVX512 64-byte constants) sections.

piotr-sneller avatar Oct 28 '22 15:10 piotr-sneller

cc @golang/compiler

heschi avatar Oct 28 '22 17:10 heschi

I believe that opcode is only implemented on arm64 and ppc64 so far.

It shouldn't be hard to do - we already pad on x86 for various alignment restrictions.

randall77 avatar Oct 30 '22 22:10 randall77

I am not able to help here, as my proficiency in the Go internals is insufficient. I would be enormously grateful though if someone with the required skills decided to contribute the necessary patch.

piotr-sneller avatar Nov 02 '22 08:11 piotr-sneller

I have looked into it and I was able to actually add the support into span6() function (x86/amd64 backend), looking like this:

			if p.As == obj.APCALIGN {
				v := -c & int32(p.From.Offset-1)
				s.Grow(int64(c) + int64(v))
				fillnop(s.P[c:], int(v))
				c += v
				pPrev = p
				continue
			}

I'm just wondering, would this change be acceptable, should I try and submit a PR regarding this feature? How to test it?

I have personally no intention about modifying anything in the go language itself. I just think that it would be nice to be able to align stuff at asm level as it's just beneficial to have this kind of control be it hot loops or data for AVX2/AVX512. I think aligning everything makes no sense, only the stuff that you want or that improves based on profiling.

The problem I have at the moment is that if you have .s file that implements bunch of functions at asm level, you can change something that would regress something completely unrelated, because some labels shift their offsets and end up in a different cache lines, etc... The difference of this in our software could be up to 1 GB/s throughput on a 16 core machine, which is normally able to process around 37.5GB/s of data.

petr-sneller avatar Jan 10 '23 18:01 petr-sneller

Seems reasonable to me.

Testing is tricky. Normally we test non-semantics-changing things like this in test/codegen, but that doesn't have any assembly tests currently. Maybe something in cmd/asm/internal/asm? It would be worth at least having a test that runs code that needs padding, just to make sure the nops are correct.

Note that if you want alignments more than 32 bytes, just aligning within a function isn't enough. You'll need to increase the alignment of function starts, which will be trickier.

randall77 avatar Jan 10 '23 19:01 randall77

Can the alignment of a function start be increased automatically when a PCALIGN is used within that function? I have seen the alignment info in FuncInfo, but I have no idea whether a greater number than 32 would be honored and actually when to update it so it gets honored.

petr-sneller avatar Jan 10 '23 19:01 petr-sneller

Code alignment up to 32 bytes would already be great progress, but data alignment to a 64-byte boundary is critical to AVX512 code. The lack of it essentially prohibits the use of VMOVDQA instruction family. This, in turn, halves the available data cache bandwidth, as the CPU needs to issue two memory operations under the hood and combine the results.

piotr-sneller avatar Jan 10 '23 19:01 piotr-sneller

You're welcome to send a patch. Thanks! If you have not, please see the contribution guide for how to contribute. https://go.dev/doc/contribute

For testing, there are similar tests for other architectures https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/ppc64/asm_test.go;l=331 https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/arm64/asm_arm64_test.go;l=105 You probably can do something similar.

For alignment larger than the default function alignment, I think you can increase the function alignment like https://cs.opensource.google/go/go/+/master:src/cmd/internal/obj/arm64/asm7.go;l=1080

cherrymui avatar Jan 10 '23 19:01 cherrymui

Change https://go.dev/cl/511662 mentions this issue: cmd/asm: add PCALIGN support for i386/amd64

gopherbot avatar Jul 22 '23 02:07 gopherbot