cps icon indicating copy to clipboard operation
cps copied to clipboard

Definition of "isa" has an OS-specific vocabulary and various other corner cases

Open smcv opened this issue 4 months ago • 0 comments

The isa field is currently defined to be a possible output of uname -m, which isn't necessarily a great fit for build systems for several reasons:

  • The existence of uname -m is a Unixism: as far as I'm aware, Windows doesn't have it at all. Is there a meaningful definition of what the isa should be on Windows, to distinguish between i386, x86_64 and others?

  • Different OSs represent the same ISA in uname -m differently. For example, Darwin's arm64 is the same as Linux's aarch64 according to GNU config.guess, the conventional Windows name for what Linux calls x86_64 is x64, and PowerPC is variously powerpc{,64} or ppc{,64}.

  • Sometimes the same ISA has multiple representations even on the same OS. For example, on Linux, i386 up to i686 are all the same ISA really, and semi-arbitrary strings like armv5tel are the same ISA as arm. The current CPS spec seems to consider i586 and i686 to be distinct ISAs, and similarly arm and armv5tel: it seems bad if a CPS-based build system is encouraged to crash out with an error like "you are compiling for i686, but the version of libfoo we found was for i586".

  • Some CPUs like PowerPC and ARM can be run in two modes, little-endian (LSB first) or big-endian (MSB first); some vocabularies of CPU families represent this as part of the architecture name, and some do not. For example, Linux uname -m on 64-bit PowerPC can output either ppc64 or ppc64le, but Meson considers both of those to be members of the ppc64 CPU family. At the moment CPS seems to consider ppc64 and ppc64le to be distinct, but it isn't clear whether this is really intentional.

(See GNU's /usr/share/misc/config.guess and /usr/share/misc/config.sub on a Linux system for many more examples of the output of uname -m needing normalization or postprocessing.)

If the ISA is important information to appear in these files, I'd suggest having a normative vocabulary of architecture names, like Meson does: https://mesonbuild.com/Reference-tables.html#cpu-families (the table ends with "Any cpu family not listed in the above list is not guaranteed to remain stable in future releases").

Defining the OS as being uname -s has many of the same issues.

smcv avatar Sep 27 '24 18:09 smcv