meson icon indicating copy to clipboard operation
meson copied to clipboard

detecting `long double` format - accessing object file at configure time

Open rgommers opened this issue 2 years ago • 13 comments

This question is a little obscure, so apologies in advance. In porting NumPy to Meson, I seem to be running into an issue with detecting the binary representation of long double the compiler uses. The way the old (distutils-based) build system does this is setup.py code for it:

  1. Write some test C code using long double
  2. Compile it
  3. Grab the filename of the object file
  4. Pass that to some custom Python code that inspects the binary and determines the format
  5. Write the name of that format into a private config.h header file

This has to be done at configure time, because the rest of the build needs the defines in config.h. So I have two problems here in my Meson port: obtaining the name of the object file, and running that custom Python code at configure time. Unless I'm missing something, this is not possible.

I'm guessing that this kind of thing is not necessarily a good idea to add to Meson. An alternative could be to add a dedicated check - similar to compiler.sizeof.

It may be possible to derive the format from just sizeof('long double') + the platform/compiler if some of the 9 supported formats are obsolete. But it's very hard to be sure of that, and there's no good way of testing it for the more obscure representations. The formats are:

  • 8 byte little-endian (LE) and big-endian (BE)
  • 12-byte LE and BE
  • 5 formats with 16 bytes of "content", which may be 80-bit Intel representation or true 128-bit long double.

For more long double fun: on Windows + Mingw-w64 the default long double representation is 80-bit, however we build SciPy with a modified Mingw-w64 toolchain where long double is 64-bit, in order to be ABI-compatible with MSVC.

I think I'll try to check the size and endianness of the type and go from there, but if there's ways to replicate the "object file introspection" approach, that would be quite nice. Any suggestions on how best to approach that?

rgommers avatar Nov 17 '22 14:11 rgommers

Meson does support running the test output, but not passing the test output to a python program. Is it really necessary to run a script that inspects the bytes? 👀

eli-schwartz avatar Nov 17 '22 15:11 eli-schwartz

Is it really necessary to run a script that inspects the bytes?

Unfortunately, yes. long double as a type really should be obsolete, it's such a giant mess. There is no standard representation, and numpy manipulates the raw bits to implement math on long double in a portable fashion.

Unfortunately, deprecating long double support completely is a little contentious and even if the team agrees that would take several releases. So I don't want to mix that in with the Meson migration.

rgommers avatar Nov 17 '22 16:11 rgommers

I'm right now just pushing trial-and-error commits to CI because apparently the Linux x86-64 job on GitHub Actions has a format that does not match my local Linux desktop:(

rgommers avatar Nov 17 '22 16:11 rgommers

I determined the immediate offender for the runtime failure: the fallback code NumPy has for strtold_l, which was broken due to not being able to configure-detect the exact binary representation of long double. NumPy has a standalone library, libnpymath, which is shipped as a static library and is a C99 compatibility layer - for platforms that don't fully implement it (e.g., MSVC doesn't support complex types), have bugs in certain routines, or are still in pre-C99 land.

Those fallback paths are the problem - they're often not tested in CI, and then these weird failures show up. I can get away with not supporting them at all for a while probably, so this issue isn't super urgent. But when we ship the Meson build as the default, it would be nice if it worked.

rgommers avatar Nov 17 '22 17:11 rgommers

Maybe I'm missing the point, but can't you write a long doube into an union like union { uint8_t bytes[16], long double x} and then inspect bytes to see how the think is represented in memory?

dnicolodi avatar Nov 24 '22 15:11 dnicolodi

Thanks @dnicolodi, that is a good suggestion and does solve a part of the problem. I think I'm still missing a way to communicate the result back. Say I use compiler.run to build some C code that prints the raw bytes. I'm not sure how to get at the stdout of that, e.g. this results in "problem encountered: UNDEFINED":

res = cc.run('_build_utils/longdouble_check.c').stdout()
error(res)

If I translate everything to C, so there's no Python to worry about anymore, is there a way to run that C code and get the result of that into configure_file() somehow?

rgommers avatar Nov 24 '22 19:11 rgommers

What's the source code you're using?

eli-schwartz avatar Nov 24 '22 19:11 eli-schwartz

I haven't written that yet, because it'd be quite a bit of work. But something like:

#include <stdio.h>
#include <stdint.h>

union ld_bytes {
  uint8_t bytes[16];
  long double ld_val;
};

int main (void)
{
    union ld_bytes obj;
    obj.ld_val = 1.0;

    printf("\n long double value = [%Lf], bytes = [%u]\n", obj.ld_val, obj.bytes);

    return 0;
}

rgommers avatar Nov 24 '22 19:11 rgommers

At a very quick look, I don't think the above compiles. That's why you cannot get the standard output of something that cannot be run. The printf last argument should be obj.bytes[0].

dnicolodi avatar Nov 24 '22 19:11 dnicolodi

It ran fine standalone. But I'll check tomorrow. If compiler.run() is supposed to be able to do this, then it'll work. The docs don't tell me that .stdout() is available though.

rgommers avatar Nov 24 '22 19:11 rgommers

It's documented here https://mesonbuild.com/Reference-manual_returned_runresult.html

dnicolodi avatar Nov 24 '22 19:11 dnicolodi

project('testlong', 'c')

cc = meson.get_compiler('c')

c = cc.run('chk.c')
message(c.compiled())
s = c.stdout()
message(s)
s = cc.run(files('chk.c')).stdout()
message(s)

outputs:

Message: false
Message: UNDEFINED
Message: 
 long double value = [1.000000], bytes = [3249673456]

Note that cc.run() expects as an argument one of two things:

  • a files() object pointing to a source code file
  • a string containing inline source code that will be written to a temporary file and compiled

eli-schwartz avatar Nov 24 '22 20:11 eli-schwartz

runresult.compiled() If true, the compilation succeeded, if false it did not and the other methods return unspecified data. This is only available for compiler.run() results.

In Meson's case, "return unspecified data" is implemented as "return the word UNDEFINED". https://sr.ht/~lattis/muon/ (a C reimplementation of Meson, which works pretty well as long as you only need C/C++ and no modules, particularly not the import('python') module...) instead does this:

if ((rr->flags & run_result_flag_from_compile)
    && !(rr->flags & run_result_flag_compile_ok)) {
        interp_error(wk, node, "this run_result was not run because its source could not be compiled");

Personally I think muon has better behavior here.

Meson's log has more details on what happened though:

Code:
 chk.c
Compiler stdout:
 
Compiler stderr:
 /home/eschwartz/git/meson/t/builddir/meson-private/tmpwp5na6ie/testfile.c:1:1: error: unknown type name 'chk'
chk.c
^
/home/eschwartz/git/meson/t/builddir/meson-private/tmpwp5na6ie/testfile.c:1:4: error: expected identifier or '('
chk.c
   ^
2 errors generated.

Could not compile test file /home/eschwartz/git/meson/t/builddir/meson-private/tmpwp5na6ie/testfile.c: 1

eli-schwartz avatar Nov 24 '22 20:11 eli-schwartz