cpython icon indicating copy to clipboard operation
cpython copied to clipboard

Expansion of NAN/HUGE_VAL is a function address on Solaris

Open TCH68k opened this issue 5 months ago • 12 comments

I did

./configure --enable-optimizations --enable-shared ax_cv_c_float_words_bigendian=yes

and it failed with the error messages in the attached log.

The compiler is GCC 5.5.0 and configure set the C standard to C99, changing it to C11 in the Makefile did not help. Did i do something wrongly, or it just needs a later GCC?

Any help is much appreciated.

TCH68k avatar Jun 26 '25 23:06 TCH68k

Solaris is not supported? Python's webpage said it should compile.

TCH68k avatar Jun 27 '25 00:06 TCH68k

Yeah, Solaris isn't an officially supported platform. See PEP 11 for the official list of supported platforms. AFAIK, recent versions support Solaris, but 3.9 is five years old at this point, and even if we wanted to fix it, 3.12 and older only get security fixes.

Are you able to compile on 3.13+? That's probably what the webpage is referring to.

ZeroIntensity avatar Jun 27 '25 00:06 ZeroIntensity

I tried to compile 3.13.5. The configure worked, albeit it threw a warning:

configure: WARNING:

Platform "sparc-sun-solaris2.10" with compiler "gcc" is not supported by the
CPython core team, see https://peps.python.org/pep-0011/ for more information.

But the compiling does not even start at all, when i type make (it is gmake 4.4.1):

-c: -c: cannot open
-c: -c: cannot open
make: *** [Makefile:883: profile-run-stamp] Error 1

TCH68k avatar Jun 27 '25 01:06 TCH68k

What make and configure arguments did you use?

ZeroIntensity avatar Jun 27 '25 01:06 ZeroIntensity

Same as i did for 3.9: --enable-optimizations --enable-shared ax_cv_c_float_words_bigendian=yes for configure and nothing for make.

TCH68k avatar Jun 27 '25 02:06 TCH68k

Py_HUGE_VAL is an __attribute__((const)) double (*)(void), because that's how Solaris defines them (see https://www.gnu.org/software/gnulib/manual/html_node/math_002eh.html):

The macros NAN and HUGE_VAL expand to a function address rather than a floating point constant on some platforms: Solaris 10.

Because of this, we can't perform artihmetic operations on it. The issue will always persist but I think we can maybe do something by changing it to -__builtin_huge_val() to get the negative HUGE_VAL.

However, there are likely to be other issues, especially if we're doing negative NaNs. So I don't think it's an easy task.

picnixz avatar Jun 27 '25 14:06 picnixz

What if i define some global variables (double PYTHON_NAN, PYTHON_HUGE_VAL;) and assign them in the "init code" of the Python interpreter

PYTHON_NAN = NAN();
PYTHON_HUGE_VAL = HUGE_VAL();

and then use these "proxy variables" everywhere? Stupid idea, or okay for a workaround?

TCH68k avatar Jun 27 '25 15:06 TCH68k

and assign them in the "init code" of the Python interpreter

Define Py_HUGE_VAL and Py_NAN before including Python and it should be fine.

picnixz avatar Jun 27 '25 15:06 picnixz

"Including Python"? You mean the global header file of Python? What definition i should give to them? If i do

#define Py_HUGE_VAL HUGE_VAL()
#define Py_NAN NAN()

would not that will result in the same errors?

TCH68k avatar Jun 27 '25 15:06 TCH68k

Py_{INFINITY,HUGE_VAL,NAN} are defined in pymath.h but only if they don't already exist. Therefore, you can update pyconfig.h (which is included before pymath.h) as follows:

#define Py_HUGE_VAL (__builtin_huge_val())
#define Py_INFINITY (__builtin_inff())
#define Py_NAN      (__builtin_nanf(""))

Hopefully, this will properly fool the compiler, though I'm not entirely sure that it's safe. It seems a known old issue though that should have been fixed in 4.x: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19933.

picnixz avatar Jun 27 '25 15:06 picnixz

Can you also check if simply upgrading gcc or switching to clang solves the issue? it might be an easier solution if this works.

picnixz avatar Jun 27 '25 15:06 picnixz

Thank you, that totally worked, Python has been successfully built, albeit it complains that it has missing parts, but the interpreter works and can work with inf and nan. Caveat: ar is not in the $PATH by default on Solaris 10, so either $PATH must contain /usr/ccs/bin, or the admin must symlink /usr/sfw/bin/gar as ar into any directory what is in $PATH, or the build will fail when some static lib is needed to be packed, regardless you've specified --enable-shared, or not.

As for the missing bits, i've attached the messages' log. I've installed the needed libs and header files for the "necessary bits", but the builder does not see them. How can i add the includes of OpenCSW to the build process? The modules failed to build also needs some libs or headers installed, but i have no idea what. I think, for _ctypes it is the libffi_dev, but that is also in the OpenCSW include dir.

Shall i open a new ticket for this one, as it is no longer related to the NAN/HUGE_VAL problem?

Edit: I intended to upgrade GCC anyway, so i can test if i can build Python without the definitions you've gave me, but i doubt that building with a later GCC will solve the problem of NAN/HUGE_VAL being functions and your defs will be needed anyway.

Edit 2: GCC6 cannot compile for this very same reason of NAN and INFINITY being functions, so i'll have to patch GCC6 too, but thanks to you i know now what to do.

TCH68k avatar Jun 28 '25 01:06 TCH68k

Can you give me the full configure traceback and the full make log please? For ffi.h, either it can't find it or it's not installed.

How can i add the includes of OpenCSW to the build process

You can do ./configure CFLAGS='-I/path/to/opencsw' where /path/to/opencsw should contain the include files (I don't know what this folder is actually).

Caveat: ar is not in the $PATH by default on Solaris 10, so either $PATH must contain /usr/ccs/bin,

You can also change AR when doing make AR=/your/path/to/your/ar -j12. It'll overwrite the AR variable (I think). Or if this doesn't work, update Makefile.pre.in with your own AR.

picnixz avatar Jun 29 '25 14:06 picnixz

Can you give me the full configure traceback and the full make log please? For ffi.h, either it can't find it or it's not installed.

Well, here is the log of ./configure and here is config.log, which i found in the directory of Python. As for the log of make, i tried to do a 2>&1 1>logfile.log, but for some reason it still printed messages to the terminal (maybe because i used a small script to do make which is actually time make ; audioplay xyz.au), fortunately less than the buffer size was, but it is still splitted, sorry for that. Here is the logfile from the redirection and here is the one with the messages in the terminal.

You can do ./configure CFLAGS='-I/path/to/opencsw' where /path/to/opencsw should contain the include files (I don't know what this folder is actually).

Thanks. It is /opt/csw/include, but nothing changed. Python still builds, but still fails to find the libs i've installed.

You can also change AR when doing make AR=/your/path/to/your/ar -j12. It'll overwrite the AR variable (I think). Or if this doesn't work, update Makefile.pre.in with your own AR.

I think, a simple and persistent symlink is more convenient, than passing ar's location for each time, or manually changing the Makefile.pre.in, but thanks for the tip.

Edit: As for trying it with a newer GCC is stalled, because something during the building process of GCC6 does not want to link: it dies with the error message of

ld: fatal: file elf64_sparc: open failed: No such file or directory

and the only related page i found is this bugticket which did not solve my problem, so i have to make a new bugticket at the GCC bugtracker to ask for help and wait for their response.

TCH68k avatar Jun 29 '25 22:06 TCH68k

You had 2>&1 1>logfile.log in the wrong order. 1>logfile.log 2>&1 will redirect both stdout and stderr to a file.

oskar-skog avatar Jun 30 '25 07:06 oskar-skog

The thought has crossed my mind, but i dismissed it. Thank you for pointing out. The full make log is now attached and to avoid confusion, i attach the other two logs as well.

python3.9_configure.log python3.9_config.log python3.9_make.log

TCH68k avatar Jun 30 '25 23:06 TCH68k

I've reported the problem with GCC6 to the devs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120905

This is not directly related, it is just information about where i am with trying compiling Python with newer GCC.

TCH68k avatar Jul 01 '25 03:07 TCH68k

This issue seems like a duplicate of issue gh-90798 which was solved by commit 54842e4311bb0e34012d1984b42eab41eeeaea6a:

commit 54842e4311bb0e34012d1984b42eab41eeeaea6a
Author: Victor Stinner <[email protected]>
Date:   Sun Feb 6 13:13:04 2022 +0100

    bpo-46640: Py_NAN now uses the C99 NAN constant (GH-31134)
    
    Building Python now requires a C99 <math.h> header file providing a
    NAN constant, or the __builtin_nan() built-in function. If a platform
    does not support Not-a-Number (NaN), the Py_NO_NAN macro can be
    defined in the pyconfig.h file.

vstinner avatar Jul 02 '25 15:07 vstinner

The problem is actually the way those macros are defined as on Solaris they are function addresses so we can't write -Py_NAN even if we have a NAN macro in some math.h. One suggestion I can make is that we have a conditional Py_NAN that expands to the __builtin_... function call instead for Solaris, or wait for GCC to fix this. Same for HUGE_VAL.

picnixz avatar Jul 08 '25 14:07 picnixz

I tried to compile 3.13.5. The configure worked, albeit it threw a warning: (...) But the compiling does not even start at all, when i type make (it is gmake 4.4.1): -c: -c: cannot open -c: -c: cannot open make: *** [Makefile:883: profile-run-stamp] Error 1

Try to build in debug mode without shared library, it's simpler: ./configure --with-pydebug ax_cv_c_float_words_bigendian=yes. Does it also fail?

If it fails, you may try to run make SHELL="bash -x" to debug the build (print executed commands).

Python 3.9 no longer accepts bugfixes.

vstinner avatar Jul 08 '25 17:07 vstinner

Thanks, but Python already builds, the problem now is that it fails to detect some libraries in the system, so it does not build some essential modules, like OpenSSL. (They are all installed.) But i think it is because i did not pass LD_LIBRARY_PATH='/opt/csw/lib:'"$LD_LIBRARY_PATH" (Or maybe it's $LIBRARY_PATH?) along with CFLAGS='-I/opt/csw/include' (or added -L/opt/csw/lib to $CFLAGS too), when did ./configure. I intend to try this, but currently my Sun Blade 100 is preoccupied with compiling GCC6. The process takes up roughly 31 hours, so i estimate it will be finished in 8 hours.

TCH68k avatar Jul 08 '25 18:07 TCH68k

Ok, if upgrading GCC doesn't work, let's open a separate issue for the build failures on Solaris as it's no longer related to the macro expansions themselves.

For this specific issue, I think it'd be better to avoid people relying on hacking their pyconfig.h if possible. Actually 7a3b03509e5e3e72d8c47137579cccb52548a318 removed the fallback to __builtin_nan which was one reason why we can't do -Py_NAN. OTOH, Py_HUGE_VAL was never correctly defined for Solaris and I guess that builds could have succeeded until then because we didn't have code with -Py_HUGE_VAL (I honestly don't know why it started failing, maybe because of some solaris / gcc version).

Now, I should say that Py_HUGE_VAL is soft-deprecated, meaning that it's not used in the latest main (since 8477951a1c460ff9b7dc7c54e7bf9b66b1722459). There will be a need to hack pyconfig.h.in for Python 3.13 only but not for Python 3.14+. On the other hand, I can't find places with -Py_NAN (we do -fabs(Py_NAN) though) but we might want to partially revert 7a3b03509e5e3e72d8c47137579cccb52548a318 which removed the __builtin_nan call.

In summary, for this issue:

  • I plan to open a PR against Python 3.13 which fixes Py_HUGE_VAL.
  • I plan to open a PR against main which fixes the use of Py_NAN (that is for compatibility purposes)
  • Then I'll close the issue. If your extensions issues persist, please open a separate issue as they are not related to this one I presume.

picnixz avatar Jul 08 '25 19:07 picnixz

Acknowledged. In the following days, i'll try to build Python 3.9 with both GCC5, with patched pyconfig.h and passing LD_LIBRARY_PATH='/opt/csw/lib:'"$LD_LIBRARY_PATH" CFLAGS='-I/opt/csw/include -L/opt/csw/lib' to ./configure, and then, GCC6, first without anything changed, then if fails on the floats, or the extensions, then with patching and/or extra flags. (But i do think, that with the correct flags, GCC5 will do the trick too and the patching and extra flags will be needed with GCC6 too.)

TCH68k avatar Jul 08 '25 21:07 TCH68k

Try to build Python 3.13 and Python 3.14 but not Python 3.9. Python 3.9 has too many differences and we will not fix anything there except for security reasons.

picnixz avatar Jul 08 '25 21:07 picnixz

I already tried to build Python 3.13 without any success. Most probably it uses dependencies, syntaxes, technologies which are not available on Solaris 10. Python 3.9 at least compiles and currently 3.9 is enough for my aims and the float problem is already "fixed". (But, i will try to compile later Python versions too, perhaps 3.10 and even 3.11 will still work on Solaris 10.)

TCH68k avatar Jul 08 '25 23:07 TCH68k

-c: -c: cannot open
-c: -c: cannot open
make: *** [Makefile:883: profile-run-stamp] Error 1

Have you tried to build it like this:

Try to build in debug mode without shared library, it's simpler: ./configure --with-pydebug ax_cv_c_float_words_bigendian=yes. Does it also fail?

The issue here you had is probably with LLVM dependencies. I really want to know which compilation issues we have just in term of C code, not in term of dependencies.

picnixz avatar Jul 12 '25 11:07 picnixz

I've opened a new bugticket under #136604 for the problems with 3.13.

And now my final report of building 3.9: i've managed to finally build it (and BTW 3.10 too) without any problems and with all modules (save from ossaudiodev, for which i have no idea what to install and what i do not need at all).

For the error of

/Python-3.9.23/Modules/_ctypes/_ctypes.c:107:17: fatal error: ffi.h: No such file or directory

i had to install libffi manually. 3.5.1 compiled and installed without any problem and Python can use it, just specify --with-system-ffi when doing ./configure. This will fix the failure of building module _ctypes.

For the error of

/Python-3.9.23/Modules/socketmodule.c: In function 'socket_sethostname':
/Python-3.9.23/Modules/socketmodule.c:5518:15: error: implicit declaration of function 'sethostname' [-Werror=implicit-function-declaration]
         res = sethostname(buf.buf, buf.len);
               ^
cc1: some warnings being treated as errors

i had to edit Modules/socketmodule.c and change the condition at

#ifdef _AIX
/* issue #18259, not declared in any useful header file */
extern int sethostname(const char *, size_t);
#endif

to

#if (defined(_AIX) || (defined(__sun) && defined(__SVR4) && (Py_SUNOS_VERSION <= 510)))

And this fixed the failure of building module _socket and __asyncio which is dependent on that module. Note: This problem was fixed in Python 3.11.

For the errors of

/Python-3.9.23/Modules/_multiprocessing/posixshmem.c: In function '_posixshmem_shm_open_impl':
/Python-3.9.23/Modules/_multiprocessing/posixshmem.c:51:14: error: implicit declaration of function 'shm_open' [-Werror=implicit-function-declaration]
         fd = shm_open(name, flags, mode);
              ^
/Python-3.9.23/Modules/_multiprocessing/posixshmem.c: In function '_posixshmem_shm_unlink_impl':
/Python-3.9.23/Modules/_multiprocessing/posixshmem.c:90:14: error: implicit declaration of function 'shm_unlink' [-Werror=implicit-function-declaration]
         rv = shm_unlink(name);
              ^
cc1: some warnings being treated as errors

i had to change Modules/_multiprocessing/posixshmem.c and add

#if (defined(__sun) && defined(__SVR4) && (Py_SUNOS_VERSION <= 510))
#define _XPG4_2
#endif

directly before

#include <sys/mman.h>

in

// for shm_open() and shm_unlink()
#ifdef HAVE_SYS_MMAN_H
#include <sys/mman.h>
#endif

And this fixed the failure of building module _posixshmem. Note: This problem is still not fixed as of Python 3.13.5.

For the problems with SSL, the solution was a bit convoluted. First, install OpenSSL 3.3.4 with the configuration of

./Configure solaris-sparcv8-gcc --release --api=1.1.1 enable-weak-ssl-ciphers no-acvp-tests no-external-tests no-tests no-unit-test -latomic -lrt

Note the --api=1.1.1 flag: Python was unable to use it, if the default(?) 3.0 API was used. Also note the additional flags of -latomic and -lrt (OpenSSL passes any unknown argument given to Configure unchanged to the compiler/linker): these are needed, because if not specified, the entire build will fail because of the missing symbol of clock_gettime from librt and the missing symbols of atomic_* from libatomic. If this is done, there is another problem: Python for some reason does not detect this properly and confuses it with OpenSSL 0.9.7 from the base Solaris 10 system in /usr/sfw and OpenSSL 1.1.1w from OpenCSW in /opt/csw. I found two methods of bypassing this:

  • Configure Python with the flag of --with-openssl=/usr/local and after doing gmake and got the report of the failure to build the SSL module, do another gmake and the errors of SSL are gone. (This must be some dependency problem; i mean it tries to build SSL before one or more of it's dependencies.)
  • Simply disable the OpenSSL in /usr/sfw and /opt/csw for the time of the compilation by renaming $BASE/include/openssl, $BASE/lib/libssl.so and $BASE/lib/libcrypto.so to something else. (By $BASE i meant /usr/sfw and /opt/csw: the include directory and the two libraries must be renamed in both base directory.)

This fixes both _ssl and _hashlib.

For a brief moment, before i forget: occasionally (i have no idea what was the trigger) the module pyexpat failed to build. This was fixed permanently by adding the flag --with-system-expat to ./configure. (Of course, the library and it's headers must be installed from OpenCSW.)

Now, for the modules which could not be built, because of "missing bits". For _gdbm, _lzma and readline, simply install their libraries and developer files/headers from OpenCSW and pass CFLAGS='-I/usr/local/include -L/usr/local/lib -I/opt/csw/include -L/opt/csw/lib' LDFLAGS='-L/usr/local/lib -L/opt/csw/lib' to ./configure.

Be advised, that these passed flags will wreak havoc on Python detecting and using curses/ncurses: the _curses module will fail, because the missing symbol of _unctrl. The problem is caused by that Python now detects ncurses in /opt/csw which does not have that function and Python fails. (Disabling it will not work: if you disable ncurses as the same way as you've did with OpenSSL, then Python will fail for another reason: it still tries to use the -lncurses flag with the linker which now will cause it to fail.) The workaround is to pass CFLAGS='-lcurses' and LIBS='-lcurses' to ./configure. This will force Python to use ncurses generally, but to reach out for the old curses for unctrl(). It is not a nice solution, but it works. (I could not find any way to force Python to use the simple curses in /usr and ignore any other (n)curses(w).)

For _sqlite3 the solution is almost the same as for the previous three modules, but after installing it's library and headers and passing the flags to ./configure, Python still does not recognize it. I had to modify setup.py a bit: i've added '/opt/csw/include/' to the array of sqlite_inc_paths in detect_sqlite. After that, Python finds it too.

Finally, _tkinter. Install the libs of Tcl and Tk and their headers from OpenCSW. Pass --with-tcltk-includes='-I/opt/csw/include' --with-tcltk-libs='-L/opt/csw/lib' and LIBS='-ltcl8.5 -ltclstub8.5 -ltk8.5 -ltkstub8.5' to ./configure. And it is done, this last module is fixed.

Well, actually the last module is ossaudiodev as i mentioned in the beginning, but that is not important. (Well, yeah, _tkinter was not important either.)

So, after modifying the files the full ./configure command to execute is:

./configure --enable-optimizations --enable-shared --prefix=/opt/python3.9 ax_cv_c_float_words_bigendian=yes --with-system-expat --with-system-ffi --with-tcltk-includes='-I/opt/csw/include' --with-tcltk-libs='-L/opt/csw/lib' CFLAGS='-I/usr/local/include -L/usr/local/lib -I/opt/csw/include -L/opt/csw/lib -lcurses' LDFLAGS='-L/usr/local/lib -L/opt/csw/lib' LIBS='-lcurses -ltcl8.5 -ltclstub8.5 -ltk8.5 -ltkstub8.5'

Then modify pyconfig.h and add

#define Py_HUGE_VAL (__builtin_huge_val())
#define Py_INFINITY (__builtin_inff())
#define Py_NAN      (__builtin_nanf(""))

Without these lines, GCC5 fails to build Python with the error messages in the opening post and GCC6 will fail after building and using the python binary for further operations, because the python binary built with that way will instantly cause SIGSEGV and thus halt the entire process. (To be honest: i've only tried to build with GCC6 and without these definitions once and that was the result. Did not have the inclination to re-test... Better to be safe than sorry.) And then, finally do gmake. (Do it twice, if you want to use OpenSSL with the --with-openssl=/usr/local flag passed to ./configure.)

And that's it. Python 3.9 has been successfully built. And as a matter of fact - as i've mentioned at the beginning - Python 3.10 has too, with the very same method it can be built.

I was not as lucky with Python 3.11. Normally it died because of the non-GNU ld failed:

/usr/ccs/bin/ld: illegal option -- no-as-needed

If i passed LDSHARED=$(command -v ld) to ./configure, then it used the GNU ld of /usr/local/bin/ld and died with these messages:

/usr/local/bin/ld: unrecognized option -Wl,-hlibpython3.11.so.1.0
/usr/local/bin/ld: use the --help option for usage information
ln: cannot access libpython3.11.so.1.0
make: *** [Makefile:889: libpython3.11.so] Error 2

At this point it was very late (or more precisely: very early - of the morning) and since Python 3.10 was way enough for my needs (the Devuan 4 on my main machine only has Python 3.9.2 and every Python stuff i use works) "we called it a draw" with Python.

Thanks for everyone helping. Still, if you have ideas of why Python 3.11 does not work with either linker, i'll appreciate if you share it. As for the currently supported Python 3.13, we can continue in #136604.

Edit: Forgot to add patches. Attached now.

p39s10.patch.txt p310s10.patch.txt

TCH68k avatar Jul 13 '25 04:07 TCH68k

cc @kulikjak

serhiy-storchaka avatar Jul 13 '25 11:07 serhiy-storchaka

cc @jcea

serhiy-storchaka avatar Jul 13 '25 11:07 serhiy-storchaka

Hi, sorry, I missed the notification for this issue.

We are building all Python runtimes since 3.4 for Solaris 11.4, and they do work very well (with just a few issues here and there :)). We never delivered Python 3 for Solaris 10 though, so many of the issues you are seeing I am not familar with.

As for the problems with Py_HUGE_VAL/Py_NAN I unfortunately cannot reproduce this even on Solaris 10 (more in #136575).

sethostname was indeed not defined in Solaris 10 headers yet.

There is a Solaris buildbot worker which might help you with what configure/compilation flags and environment variables need to be set: https://buildbot.python.org/all/#/builders/1262 (we are seeing some of those to successfully support curses and openssl bits).

And here is a GitHub mirror of our internal repo, will all the additional patches we do apply onto the Python 3.9 sources to build and test it successfully (many of which you won't need, but some might help): https://github.com/oracle/solaris-userland/tree/fe5533d1de936a5632fddb033b1b593f5e7b96c1/components/python/python39/patches A lot of the important ones are merged into main now, but 3.9 is quite old now and is missing them.

kulikjak avatar Sep 10 '25 07:09 kulikjak