go
go copied to clipboard
proposal: encoding/binary: add var NativeEndian ByteOrder = LittleEndian or BigEndian
I'd like to revisit #35398 (proposal: encoding/binary: add NativeEndian). Sorry. I know the history & people's opinions already. But I come with a story! π
Go 1.19 added support forGOARCH=loong64 (https://go.dev/doc/go1.19#loong64)
So naturally somebody wanted to compile tailscale.com/cmd/tailscaled with GOOS=linux GOARCH=loong64. Compilation failed.
It turned out we have four different native endian packages in our dependency tree:
- https://pkg.go.dev/tailscale.com/util/endian (
const Big = false,var Native = binary.LittleEndian) - https://pkg.go.dev/github.com/josharian/native (
var Endian = binary.LittleEndian)- example dep path:
"tailscale.com/net/interfaces" => "github.com/mdlayher/netlink" => "github.com/mdlayher/netlink/nlenc" "github.com/josharian/native"
- example dep path:
- https://pkg.go.dev/github.com/u-root/uio/ubinary (
var NativeEndian = binary.LittleEndian)- example dep path:
"tailscale.com/wgengine/router" => "tailscale.com/net/dns" => "tailscale.com/net/tstun" => "github.com/insomniacslk/dhcp/dhcpv4" => "github.com/u-root/uio/uio" => "github.com/u-root/uio/ubinary"
- example dep path:
- wireguard-go's upcoming
wireguard/endian(in review at https://github.com/WireGuard/wireguard-go/pull/64/files#diff-8b7f475f3cbc65d1d51dd1c959d0232aad2aed0fbf7967207a97bfb0f94abdba) because it doesn't take external dependencies outside ofstdorgolang.org/x/*
So we had to update all four, along with the various requisite go.mod bumps.
Some observations:
- they're all very similar
- people don't like taking dependencies on other big packages. @josharian's
github.com/josharian/nativethat he mentioned in https://github.com/golang/go/issues/35398#issuecomment-675015994 is closest, but lacks the constant that we ended up needing in Tailscale. So everybody makes their own local copies instead. That works until a new GOARCH comes along. Maybe that's rare enough? But I'm sure moreriscv*variants will come along at some point.
x/sys/cpu already has this code:
https://cs.opensource.google/go/x/sys/+/refs/tags/v0.3.0:cpu/byteorder.go;l=44
And it has even more GOARCH values (for gccgo) than any other package has!
So everybody has a different subset of GOARCH values it seems.
I know people don't want to encourage thinking about or abusing endianness, but it's a reality when talking to kernel APIs. And this is kinda ridiculous, having this duplicated incompletely everywhere.
It would've been neat if Go could've added loong64 and had a bunch of code in the Go ecosystem just work right away and not require adjusting build tags.
Alternatively, if std and x/sys/cpu are too objectionable: what about new build tags?
/cc @josharian @mdlayher @hugelgupf @zx2c4 @yetist @jwhited @raggi
Yes please. Another option: unsafe.NativeEndian to imply that you shouldn't use this unless you are aware of the implications?
CC @robpike
@josharian's github.com/josharian/native that he mentioned in https://github.com/golang/go/issues/35398#issuecomment-675015994 is closest, but lacks the constant that we ended up needing in Tailscale.
FWIW, I know the author of that package, and heβs amenable to adding a constant.
But an x/sys/cpu or build tag option would be better. :)
You know my position about byte order. Yes, it's sometimes necessary but it's used far more than it should be. I'd prefer to take up @josharian's first suggestion and not have the standard library promote the concept.
I'd prefer to take up @josharian's first suggestion and not have the standard library promote the concept.
What about const IsBigEndian = false in x/sys/cpu? That's not the standard library. And that cpu package is very much about telling you what the CPU does.
I'm not going to die on this hill, but know too that big/little endian is not the full story for some architectures. Maybe Go will never support a nutty layout, but who knows?
In other words, is a boolean sufficient? It might be nowadays, but I honestly don't know. It's probably sufficient to support the architectures encoding/binary does, so maybe it's enough.
At least your suggestion doesn't promote the idea of "native" byte order explicitly.
But: A single third-party package that just told you the byte order by computing it (this can only be done unsafely, which is a BIG part of why I dislike the concept) seems like the real right answer to me.
Is part of the problem here more one of naming, e.g. "NativeEndian", perhaps something closer to what this is used for in most good cases is the systems ABI endianness. I'm not sure of a good short name to describe that, but perhaps if we had one it would be less objectionable?
@raggi Not especially. It's not what it's called, it's what it represents, an internal detail that is almost always (not always, but almost always) irrelevant. A long history of bad C code has taught people that it matters more than it does.
Only unsafe code can care.
I made this exact mistake recently. I'm working on a pure Go Wayland protocol implementation. Wayland, despite being essentially a network protocol, is constrained to Unix domain sockets for various practical reasons. Since everything's on the same machine anyways, I assume, the developers decided to just use native endianness for data being sent. Early on in my project, I created a global var byteOrder binary.ByteOrder that I then set in an init() by detecting the native endianness via unsafe. It was only later that I realized that this was 100% pointless, serving literally no purpose other than hurting the performance ever so slightly. It's pointless because all it's actually doing is getting the raw bytes of various types. This is just as easily, and far more efficiently, accomplished by just, for example, doing something like *(*[4]byte)(unsafe.Pointer(&v)) to get the raw bytes of either a int32 or uint32. I wrote a couple of functions, such as func Bytes[T ~int32 | ~uint32](v T) [4]byte, to simplify it a bit further and that was the end of it.
I have to agree with @robpike. I used to wonder why this was missing but now I just don't see the point. All it would do would be to obscure stuff that probably should be unsafe. I took a look through several packages that use the above referenced native endianness packages and none of them require a native endianness. For example, Tailscale doesn't use it much, and every usage could easily be replaced with an unsafe conversion, the same goes for Cloudflare's usage of it, and I'm not even sure what's going on in this one. On the other hand, wireguard-go has a discussion that talks about unsafe casting, but only seems to consider casting an entire struct as an alternative instead of casting just the slices that they're already handling individually anyways, which seems a bit odd to me if the goal is remove the dependency.
Unless I'm missing something, none of these seem like they actually require any kind of NativeEndian and their insistence on using a library to detect how their system reads and writes memory instead of just reading and writing that memory and letting the computer do what it's designed to do is causing issues that they just simply wouldn't otherwise have. It's also notable that every usage that I saw directly calls the methods on the ByteOrder implementation, rather than passing it to binary.Read() or binary.Write(). Calling the methods directly is even more pointless, since you have to know the size in advance anyways.
Endianness is generally not something that should affect anything outside of network protocols and file formats. In other words, it only really matters for stuff where data could be read by two processes that might be running on different computers. In those cases, the endianness needs to be explicitly declared as something standardized between the two, hence binary.LittleEndian and binary.BigEndian. I don't know what the point of a native endian implementation of binary.ByteOrder would be other than to confuse people into thinking that there's a similarity in the usage between native endianness and predefined endianness.
You have to be careful with alignment on some archs with that sort of unsafe cast.
Typically what I've done in C is memcpy to the destination and let gcc figure out whether that has to be byte by byte or can be word-wise, with load store folding removing the insufficiency.
You have to be careful with alignment on some archs with that sort of unsafe cast.
How so, and how does a NativeEndian implementation help with that? If you can reference the memory, the computer shouldn't care what type you're treating the bytes as I would think. For example, what's the difference between
v.hdrLen = endian.Native.Uint16(b[2:])
and
v.hdrLen = *(*uint16)(unsafe.Pointer(&b[2]))
Genuinely curious. It's reading the same bytes in the same order. If there are issues with reading those bytes that seems like it should be something the compiler should handle transparently.
@DeedleFake the compiler only coalesces byte-by-byte memory reads/writes if it knows that it is safe to do so, based on the architecture, alignment, whatnot. The unsafe version says to the compiler: just do it, whether or not it is safe.
What actually happens if you attempt to do
v.hdrLen = *(*uint16)(unsafe.Pointer(&b[2]))
on something like MIPS and the alignment isn't correct? Do you just get junk data? Does it panic? That's not something I've run into before, despite working with some relatively low level stuff on MIPS a bit a fair while back.
Edit: Never mind. I think I answered my own question.
It traps into the kernel, where the kernel decodes the offending instruction and emulates it by doing a byte-by-byte read. You can also tell the kernel to sigbus your process
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/mips/kernel/unaligned.c
It is still possible to do this with unsafe without regard to endianness. I believe the following alternative is not subject to alignment bugs:
copy(unsafe.Slice((*byte)(unsafe.Pointer(&v.hdrLen)), unsafe.Sizeof(v.hdrLen)), b[2:])
(https://go.dev/play/p/ig26Po_yPnr)
Yea this is the memcpy trick I mentioned. A good compiler should produce optimal code here, since it knows whether the arch can do the fixed-length copy wordwise or not.
Maybe an explicit unsafe function or two for just this purpose would make sense, then? It could potentially do the copy automatically if necessary and not do it if it doesn't have to, with the compiler optimizing it away if it can. Something like
package unsafe
func Bytesof(x ArbitraryType) [...]byte // x must be numeric. Returns a byte array, not a slice.
func Valueof(p ArbitraryType, x []byte) // p must be a pointer to a numeric type.
This comes up so often that I think we should just add it. We can add a nice comment on it explaining that in many cases there may be better ways to write the code, but lots of people want to write the code this way, sometimes even for good reasons, and we are making things harder, not easier, for them.
We are stepping on developers toes instead of letting them stand on our shoulders.
This proposal has been added to the active column of the proposals project and will now be reviewed at the weekly proposal review meetings. β rsc for the proposal review group
Here is a concrete proposal.
package binary
var NativeEndian nativeEndian
type nativeEndian ...
same implementation as big or little depending
We can't use a type alias for nativeEndian, nor can do:
var NativeEndian bigEndian // or littleEndian
with build tags because then
x := binary.NativeEndian
x == binary.LittleEndian
or even
binary.NativeEndian == binary.LittleEndian
would only compile on certain machines.
And we can't do
var NativeEndian ByteOrder = LittleEndian
because then NativeEndian.Uint32 won't inline.
And then, separately, in x/sys/cpu
package cpu
const IsBigEndian = false
The duplication of the full implementation is clumsy enough to argue against this approach. Can you please explain better what the problems are with the obvious but apparently unsuitable solutions?
I believe that we could avoid duplicating the full implementation with a touch of embedding:
var NativeEndian nativeEndian
type nativeEndian struct {
littleEndian // or bigEndian
}
And just to throw out more ideas...another option for exposing a useful constant directly via encoding/binary could be to give nativeEndian (and bigEndian and littleEndian?) an IsLittleEndian/IsLittle method that returns a constant. The compiler should then be able to inline binary.NativeEndian.IsLittleEndian() and do all the constant propagation as needed from there.
And just to throw out more ideas...another option for exposing a useful constant directly via encoding/binary could be to give nativeEndian (and bigEndian and littleEndian?) an IsLittleEndian/IsLittle method that returns a constant.
I brought that up, but mostly as a joke: because the types are unexported, an exported method on an unexported doc is (1) weird and (2) would not show up in godoc. And we can't add a method to https://pkg.go.dev/encoding/binary#ByteOrder without breaking code because that interface doesn't have an exported method so people might've implemented it. We could add to the godoc on it and say "oh, and btw, there's a secret IsFooEndian method too" but that's weird.
@josharian's embedding approach looks good to me and seems to address @robpike's objection.
Updated title to add the x/sys/cpu.IsBigEndian constant in the concrete proposal.
Are there any remaining objections to this proposal?
Is binary.NativeEndian.IsLittleEndian included in the proposal and if it is, why is x/sys/cpu.isBigEndian still required?
If I'm honest I don't like these boolean methods, because there are schemes like BE-32 supported by older ARMs (e.g. ARMv6). BE-32 stores 0x0807060504030201 as 0x04, 0x03, 0x02, 0x01, 0x08, 0x07, 0x06, 0x05.
I regard also defining the CPU endianness as problematic, because CPUs like ARMv7 are able to support multiple endianness modes. For those CPUs the OS defines the endianness of the platform.
The current proposal does not include binary.NativeEndian.IsLittleEndian.
It's true that there are systems with various different forms of middle-endian order. Still, people do ask whether a system is big-endian or little-endian. And Go doesn't support any middle-endian systems and it's not clear that it ever will. Fortunately BE-32 mode has been deprecated. (And it's not clear that anybody will ever make a new processor that is not little-endian. Why would they?)
For Go endianness is represented using a different GOARCH value. For example, both arm64 and arm64be are valid GOARCH values. Similarly for ppc64 and ppc64le.
I regard also defining the CPU endianness as problematic, because CPUs like ARMv7 are able to support multiple endianness modes. For those CPUs the OS defines the endianness of the platform.
These values are reporting the interpretation of the memory in the currently executing process, not making a claim about all programs on the current CPU.
Based on the discussion above, this proposal seems like a likely accept. β rsc for the proposal review group