LibreELEC.tv icon indicating copy to clipboard operation
LibreELEC.tv copied to clipboard

[BUG][LE10] Build fails due to glib error on when cross-compiling on AARCH64 host

Open pretoriano80 opened this issue 2 years ago • 11 comments

Describe the bug

Build fails on when cross-compiling on AARCH64 host due to glib error (frexpl() is missing or broken beyond repair)

Informations

  • LE Version: LE10
  • Hardware Platform: RPi4
  • Build system: Ubuntu 20.4 aarch64

Log file

frexpl() is missing or broken beyond repair

Additional context

I've found on Google that @heitbaum faced the same issue a few months ago and it has an easy fix.

Fix

The following patch addresses the issue but i don't know if it cause issues on other compiling hosts

diff --git a/glib/gnulib/meson.build b/glib/gnulib/meson.build
index 38b530aa0..4408a3dc6 100644
--- a/glib/gnulib/meson.build
+++ b/glib/gnulib/meson.build
@@ -302,12 +302,12 @@ else
   gl_cv_func_frexpl_broken_beyond_repair = true
 endif
 
-if not gl_cv_func_frexp_works and gl_cv_func_frexp_broken_beyond_repair
-  error ('frexp() is missing or broken beyond repair, and we have nothing to replace it with')
-endif
-if not gl_cv_func_frexpl_works and gl_cv_func_frexpl_broken_beyond_repair
-  error ('frexpl() is missing or broken beyond repair, and we have nothing to replace it with')
-endif
+#if not gl_cv_func_frexp_works and gl_cv_func_frexp_broken_beyond_repair
+#  error ('frexp() is missing or broken beyond repair, and we have nothing to replace it with')
+#endif
+#if not gl_cv_func_frexpl_works and gl_cv_func_frexpl_broken_beyond_repair
+#  error ('frexpl() is missing or broken beyond repair, and we have nothing to replace it with')
+#endif
 
 math_h_config.set ('REPLACE_FREXP', gl_cv_func_frexp_works ? 0 : 1)
 math_h_config.set ('REPLACE_FREXPL', gl_cv_func_frexpl_works ? 0 : 1)

pretoriano80 avatar Mar 30 '22 08:03 pretoriano80

@pretoriano80 - I think this is the fix that actually “fixed” the issue properly.

  • #5276

heitbaum avatar Mar 30 '22 09:03 heitbaum

Ok,i will merge it on my LE10 branch and test it.I can test both on x86/64 and aarch64.

pretoriano80 avatar Mar 30 '22 09:03 pretoriano80

Here are my tests resuls so far:

Build system: Ubuntu 20.04 AARCH64 Build test: PROJECT=Generic ARCH=x86_64 --> build OK PROJECT=RPi DEVICE=RPi4 ARCH=arm --> build OK Results: https://github.com/LibreELEC/LibreELEC.tv/pull/5276 addressed the issue

Build system: Ubuntu 20.10 x86/64 Build test: PROJECT=Generic ARCH=x86_64 --> build FAILED PROJECT=RPi DEVICE=RPi4 ARCH=arm --> build FAILED Results: https://github.com/LibreELEC/LibreELEC.tv/pull/5276 addressed this issue but the build failed due to other m4 and glib errors

Fix: Additional patches are required NOTE The following patches doesn't seem to create any problem with other building host (tested on Ubuntu 20.04 AARCH64)

m4 patch -> reference

diff --git a/lib/c-stack.c b/lib/c-stack.c
index 5353c08..863f764 100644
--- a/lib/c-stack.c
+++ b/lib/c-stack.c
@@ -51,13 +51,14 @@
 typedef struct sigaltstack stack_t;
 #endif
 #ifndef SIGSTKSZ
-# define SIGSTKSZ 16384
-#elif HAVE_LIBSIGSEGV && SIGSTKSZ < 16384
+#define get_sigstksz()  (16384)
+#elif HAVE_LIBSIGSEGV
 /* libsigsegv 2.6 through 2.8 have a bug where some architectures use
    more than the Linux default of an 8k alternate stack when deciding
    if a fault was caused by stack overflow.  */
-# undef SIGSTKSZ
-# define SIGSTKSZ 16384
+#define get_sigstksz() ((SIGSTKSZ) < 16384 ? 16384 : (SIGSTKSZ))
+#else
+#define get_sigstksz() ((SIGSTKSZ))
 #endif
 
 #include <stdlib.h>
@@ -131,7 +132,8 @@ die (int signo)
 /* Storage for the alternate signal stack.  */
 static union
 {
-  char buffer[SIGSTKSZ];
+  /* allocate buffer with size from get_sigstksz() */
+  char *buffer;
 
   /* These other members are for proper alignment.  There's no
      standard way to guarantee stack alignment, but this seems enough
@@ -203,10 +205,11 @@ c_stack_action (void (*action) (int))
   program_error_message = _("program error");
   stack_overflow_message = _("stack overflow");
 
+  alternate_signal_stack.buffer = malloc(get_sigstksz());
   /* Always install the overflow handler.  */
   if (stackoverflow_install_handler (overflow_handler,
                                      alternate_signal_stack.buffer,
-                                     sizeof alternate_signal_stack.buffer))
+                                     get_sigstksz()))
     {
       errno = ENOTSUP;
       return -1;
@@ -279,14 +282,15 @@ c_stack_action (void (*action) (int))
   stack_t st;
   struct sigaction act;
   st.ss_flags = 0;
+  alternate_signal_stack.buffer = malloc(get_sigstksz());
 # if SIGALTSTACK_SS_REVERSED
   /* Irix mistakenly treats ss_sp as the upper bound, rather than
      lower bound, of the alternate stack.  */
-  st.ss_sp = alternate_signal_stack.buffer + SIGSTKSZ - sizeof (void *);
-  st.ss_size = sizeof alternate_signal_stack.buffer - sizeof (void *);
+  st.ss_sp = alternate_signal_stack.buffer + get_sigstksz() - sizeof (void *);
+  st.ss_size = get_sigstksz() - sizeof (void *);
 # else
   st.ss_sp = alternate_signal_stack.buffer;
-  st.ss_size = sizeof alternate_signal_stack.buffer;
+  st.ss_size = get_sigstksz();
 # endif
   r = sigaltstack (&st, NULL);
   if (r != 0)

glib patch --> reference

From b71117d89434db83d34bc1b981ca03d4be299576 Mon Sep 17 00:00:00 2001
From: Khem Raj <[email protected]>
Date: Thu, 8 Jul 2021 17:26:43 -0700
Subject: [PATCH] correctly use 3 parameters for close_range

libc implementation has 3 parameter e.g.
https://www.freebsd.org/cgi/man.cgi?query=close_range&sektion=2&format=html

Signed-off-by: Khem Raj <[email protected]>
---
 glib/gspawn.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/glib/gspawn.c b/glib/gspawn.c
index 899647c2f8..3073a10a42 100644
--- a/glib/gspawn.c
+++ b/glib/gspawn.c
@@ -1520,7 +1520,7 @@ safe_closefrom (int lowfd)
    *
    * Handle ENOSYS in case it’s supported in libc but not the kernel; if so,
    * fall back to safe_fdwalk(). */
-  if (close_range (lowfd, G_MAXUINT) != 0 && errno == ENOSYS)
+  if (close_range (lowfd, G_MAXUINT, 0) != 0 && errno == ENOSYS)
 #endif  /* HAVE_CLOSE_RANGE */
   (void) safe_fdwalk (close_func, GINT_TO_POINTER (lowfd));
 #endif
-- 
GitLab

pretoriano80 avatar Mar 30 '22 12:03 pretoriano80

I 've found another issue,this time the Generic x86/64 build is failing due to digital_devices cross-compile error.

Build system - Ubuntu 20.04 aarch64 LibreELEC - LE10 Project - Generic x86/64 Logs - 139.log

Adding these to the package.mk fix the above error,but fails with another

pre_make_target() {
  if [ "$(uname -p)" = "aarch64" ]; then
  export ARCH=arm64
  fi
} 

Log --> 138.log

If you think that it's more appropriate,i will open another bug report.

pretoriano80 avatar Mar 30 '22 16:03 pretoriano80

@heitbaum any idea how to proceed with these issues.Or there's no point to merge this stuff into the "old" LE10 branch ?

And any idea on the digital_devices error,it seems similar to the one we had with grub and aarch64 host?!

pretoriano80 avatar Apr 02 '22 10:04 pretoriano80

Having a look at this now - Im guessing that the build error is one of 2 things.

  • the bumps to m4 and glib in master fixed the issue you are seeing with aarch64 compiling “x11” I think…

    • m4
    • 3a8f590f8f50f8641be91106b3222219ce269400
    • glib
    • A number of fixes have been dropped in master that weren’t backported.
    • https://github.com/LibreELEC/LibreELEC.tv/commits/eb57034c3fd4d28bb5d78f25881ddef28b48fd53/packages/devel/glib
    • But #5276, #5661 and #6024 are probably key to it working. (And the meson updates)

There were a number of issues worked through - e.g. grub, rust, where host and target builds weren’t quite separated / correct. Some found during the glibc bump/glibc being newer on the target than the build host. There are still a couple identified Python3 ones. (They are in the issues) I pretty sure all target builds even of the same triple and now all done is cross compile. Note: there is an issue that rust uses /usr/bin/cc during its build :-(

for backporting the discussed packages/build environment and or the patches to LE10 would need to be evaluated risk versus reward. For having them as additional patches in your build branch and using/testing - will be good/ and work. Many of the above discussed fixes were PRed before le10 but were not included at the time given potential impact to the tested/stable release.

I would like to get to the bottom of the digital devices one. (If it is indeed in le11) so the DEVICE=Generic-legacy will need to be tested on the aarch64 build server. Otherwise it must have been addressed in the above le11 PRs.

heitbaum avatar Apr 02 '22 12:04 heitbaum

The digtal_device error only occurs on LE10 iirc,but i can test on LE11 Gerneric x86/64 if necessary ? BTW,do you still have that Oracle Ubuntu aarch64 instance,if not ,i can give you access to mine?!

pretoriano80 avatar Apr 02 '22 13:04 pretoriano80

Last night i tested a Generic x86/64 LE11 build on Ubuntu 20.04 aarch64, it built without any error. So digital_devices only fails on LE10.

pretoriano80 avatar Apr 03 '22 06:04 pretoriano80

Last night i tested a Generic x86/64 LE11 build on Ubuntu 20.04 aarch64, it built without any error. So digital_devices only fails on LE10.

so the issue is probably resolved with one of the - #5276, #5661 and #6024 // or other toolchain updates

heitbaum avatar Apr 03 '22 07:04 heitbaum

DVB driver addons are disabled in LE11 builds as long the kernel is updated.

mglae avatar Apr 03 '22 13:04 mglae

That explains why i don't have that error on LE11 xD

pretoriano80 avatar Apr 03 '22 14:04 pretoriano80

  • Closing - as only #7926 is the only known build on aarch64 error.

heitbaum avatar Jun 17 '23 23:06 heitbaum