termux-packages icon indicating copy to clipboard operation
termux-packages copied to clipboard

[RFC] fix crossbuild prefix pollution

Open robertkirkman opened this issue 1 year ago • 7 comments

Fixes #20336 and progress on #21130

Fixes

  • ERROR: Failed running '/data/data/com.termux/files/usr/bin/llvm-config', binary or interpreter not executable.
  • Interpreter: Cannot run the interpreter "/data/data/com.termux/files/usr/bin/python3"
    • Similar errors involving around 15 to 20 other binaries that are not python3, such as sed and xsltproc.

in mesa + a large number of other packages that I forgot, when built from entrypoint scripts/run-docker.sh ./build-all.sh

Fixes many errors similar to

  • nasmlib/md5c.c:46:10: error: no member named 'buf' in 'struct MD5Context'

in

  • nasm
  • bitlbee
  • emacs
  • gdb
  • libjxl
  • libmediainfo
  • maxcso
  • nodejs
  • pipewire
  • rust
  • sqlcipher
  • tome2
  • weggli
  • zip
  • possibly more like handbrake

Fixes

  • sl.c:51:10: fatal error: 'curses.h' file not found

in sl and any other package that requires curses.h.

when built from entrypoint scripts/run-docker.sh ./build-all.sh.

I have not finished testing all the codepaths I want to test these changes on, but I would like to know, do you know any better solutions to these errors or better ways to write these changes?

The commit guidelines say you have an 80 column limit, but the same file I edited already has multiple 120+ column lines, leaving me confused. Should I reformat the entire file to fit in 80 columns?

Also, the commit guidelines do not have very many directions regarding changes to the scripts folder, leaving me unsure whether I'm following the correct coding style in general for this folder.

Possibly preexisting alternative solution

I strongly suspect that termux-play-store/termux-packages works around/bypasses many of these errors using a very different codepath. It is possible that maybe fornwall prefers that method over this one, so if that repo is going to combine into this repo someday, maybe it's better to use that solution instead.

robertkirkman avatar Oct 16 '24 21:10 robertkirkman

Regarding https://github.com/termux-play-store/termux-packages/commit/2bef6d4591e1ab5e0ba3e588a5ce19559bdd3d1e - always starting $PREFIX from a clean slate and install dependencies from .deb/.pkg packages (regardless if they are built locally or not) - I think that's a good approach in general, to simplify reasoning about builds and making them more reproducible, instead of the build result depending on what might be left behind from previous (related or unrelated) builds. It currently takes some shortcuts and needs to be cleaned up and generalized to pacman and more build options, but I intend to submit a PR for discussion soon.

But regardless, this PR is interesting! It's not actually clear to me why we see a lot of build errors such as the ones above:

  • ERROR: Failed running '/data/data/com.termux/files/usr/bin/llvm-config', binary or interpreter not executable.
  • Interpreter: Cannot run the interpreter "/data/data/com.termux/files/usr/bin/python3"

when building with ./build-all.sh, but we do not get the error when building the package directly with ./build-package.sh -i <pkg>. Is that because (in the above two examples of errors) the dependencies does not include $PREFIX/bin/llvm-config or $PREFIX/bin/python3, so they are just not there interferring with the build when building with -i?

And in packages that does include cross-compiled $PREFIX/bin/llvm-config or $PREFIX/bin/python3 installed through package depdencies, so that the files are there when building with -i, the build just happens to work (or we have worked around it in the build, like specifying -DPYTHON_EXECUTABLE=$(command -v python3) to cmake, to pick up the host python3 instead of the cross-compiled one)?

Put in another way: If we migrate build-all.sh to always start out with a clean $PREFIX and install dependencies from locally built packages when building each package, are the changes in this PR necessary and/or desireable?

fornwall avatar Oct 16 '24 22:10 fornwall

Is that because (in the above two examples of errors) the dependencies does not include $PREFIX/bin/llvm-config or $PREFIX/bin/python3, so they are just not there interferring with the build when building with -i?

Correct, for the python3 example and some other similar errors this fixes, but in the case of specifically llvm-config, it is a subtly different problem that was reported in, and which I explained in some additional detail in, #20336 . Basically, there is a condition if [ "$TERMUX_INSTALL_DEPS" = false ] that was causing an error for me and other people, including it would seem, the people in #21130 . Do you know what the original purpose of that condition is and what package it's needed for? For me it seems to work better when I remove it. EDIT: the condition is very old and dates back to what seems like the original support for building mesa https://github.com/termux/termux-packages/commit/b6980935d41da4cdd5a8222fc022a8c9b7e3fc36 . Since mesa cannot be cross-compiled when TERMUX_INSTALL_DEPS is false unless the condition is removed, I think it is safe for me to suggest it should be removed.

And in packages that does include cross-compiled $PREFIX/bin/llvm-config or $PREFIX/bin/python3 installed through package depdencies, so that the files are there when building with -i, the build just happens to work (or we have worked around it in the build, like specifying -DPYTHON_EXECUTABLE=$(command -v python3) to cmake, to pick up the host python3 instead of the cross-compiled one)?

I'm pretty sure that is correct yes, but it should be noted that python3 is one example out of a large number of incompatible binaries that cause near-identical errors when building other packages. So it would not be possible to fix every affected package by using just -DPYTHON_EXECUTABLE=$(command -v python${TERMUX_PYTHON_VERSION}).

Put in another way: If we migrate build-all.sh to always start out with a clean $PREFIX and install dependencies from locally built packages when building each package, are the changes in this PR necessary and/or desireable?

No, if we wait for your PR and go with that, I'm pretty sure it avoids all errors caused by the prefix having other packages installed into it previously (at a very small, surely insignificant speed cost incurred by repeatedly reconstructing the prefix every time a package builds)

I use Gentoo, so every time I update my PC, every new package installed is recompiled directly using the same rootfs that I have kept installed since I installed the OS. I guess my method in this PR is just inspired by that, to be able to continuously keep rebuilding and reinstalling all Termux packages in an endless loop, into the same crossbuild prefix, by attempting to universally solve all potential errors that can occur when building the repository that way.

robertkirkman avatar Oct 16 '24 22:10 robertkirkman

WIP: changes to $TERMUX_PREFIX/lib folder

Before I put code for $TERMUX_PREFIX/lib folder conflicts into this PR, I'm going to think about possibly rewriting some code in or nearby affected packages, or coming up with even better solutions for when there are Library file conflicts - so the solutions explained in this comment will change before I commit them here.

There can sometimes be cases where a "special" package (package that does not fit exactly into the same traditional C or C++ toolchain use pattern as most of the packages) has a build system that we would colloquially say is "confused" by the presence of ARM and/or bionic-libc-linked libraries that it does not inherently have the ability to ignore. Two examples are the zig and rust packages:

zig

"3 stages bootstrapping build system" that produces a large 100+ Megabyte, statically-linked, seemingly musl-libc-based executable with no dependency on bionic libc

error: ld.lld: /data/data/com.termux/files/usr/lib/libncursesw.so is incompatible with elf64-x86-64

I noticed that for one possible preexisting workaround example, the preexisting code for the hostbuild step of bootstrapping xmake looks directly analogous.

https://github.com/termux/termux-packages/blob/9a344926f9e0138ddd7ea814d9cb53749bd6b43f/scripts/build/setup/termux_setup_xmake.sh#L47-L53

I might copy that but I might also continue trying to think of other ways to implement that type of workaround.

Update: In my local repo, I have expanded the xmake-related example code above to also apply to 16 other hostbuild-steps and musl-packages including zig, and it has worked to fix error: ld.lld: /data/data/com.termux/files/usr/lib/libncursesw.so is incompatible with elf64-x86-64 and build a zig package successfully. I can't test this with cross-compilation disabled since the unmodified zig package does not seem to build ondevice at the moment (with a different error), but I will eventually cycle around to testing the 16 other packages I applied the same fix to and make sure that they still cross-compile and also benefit from the prefix pollution prevention of the fix. If they all seem to work then I will upload that code.

I used helpful comments like these to learn about the current status of the Zig toolchain in Termux, and hopefully they helped me write a slightly more robust "isolation mode" to prevent all bionic libc libraries from being exposed to Zig code for the time being.

https://github.com/termux/termux-packages/blob/ed5bcd4b975dce30da0661d8e59b9668b13dbe87/packages/ncdu2/build.sh#L29

https://github.com/termux/termux-packages/blob/ed5bcd4b975dce30da0661d8e59b9668b13dbe87/scripts/build/setup/termux_setup_zig.sh#L3

The third currently enabled Zig package, zls, doesn't currently have any isolation so if it continues building and working I won't modify it. Maybe that one is just easier to compile than other Zig components since it builds with zig instead of make.

rust

depends directly at runtime on bionic libc and the normal Termux copy of libllvm, but has a long build script with heavy patching that might be easily impacted by the crossbuild prefix state.

This, and a few similar changes, were required to avoid more errors, due to the eventual spontaneous appearance of a libz.a:

--- a/packages/cargo-c/build.sh
+++ b/packages/cargo-c/build.sh
@@ -41,11 +41,14 @@ termux_step_pre_configure() {
 
 	mv $TERMUX_PREFIX/lib/libz.so.1{,.tmp}
 	mv $TERMUX_PREFIX/lib/libz.so{,.tmp}
+	mv $TERMUX_PREFIX/lib/libz.a{,.tmp}
 
 	ln -sfT $(readlink -f $TERMUX_PREFIX/lib/libz.so.1.tmp) \
 		$_CARGO_TARGET_LIBDIR/libz.so.1
 	ln -sfT $(readlink -f $TERMUX_PREFIX/lib/libz.so.tmp) \
 		$_CARGO_TARGET_LIBDIR/libz.so
+	ln -sfT $(readlink -f $TERMUX_PREFIX/lib/libz.a.tmp) \
+		$_CARGO_TARGET_LIBDIR/libz.a
 
 	if [[ "${TERMUX_ARCH}" == "x86_64" ]]; then
 		RUSTFLAGS+=" -C link-arg=$($CC -print-libgcc-file-name)"
@@ -55,9 +58,11 @@ termux_step_pre_configure() {
 termux_step_post_make_install() {
 	mv $TERMUX_PREFIX/lib/libz.so.1{.tmp,}
 	mv $TERMUX_PREFIX/lib/libz.so{.tmp,}
+	mv $TERMUX_PREFIX/lib/libz.a{.tmp,}
 }
 
 termux_step_post_massage() {
 	rm -f lib/libz.so.1
 	rm -f lib/libz.so
+	rm -f lib/libz.a
 }

[!IMPORTANT] The reason I believe it would be very desirable to find a way to replace the lines that look like mv $TERMUX_PREFIX/lib/libz.so.1{,.tmp} with a different method is because, for example, if rust or cargo-c fails to build, but is skipped and someone tries to build pypy afterward, pypy will fail to build with ImportError: unable to load extension module '/home/builder/.termux-build/pypy/src/lib_pypy/_tkinter/tklib_cffi.pypy-73.so': dlopen failed: library "libz.so.1" not found because the libz.so and libz.so.1 have remained renamed to libz.so.tmp and libz.so.1.tmp, respectively.

Notes: Affects cargo-c. Affects findomain. Affects librav1e. (#20100) The workaround in termux-packages is currently to nuke (backup and restore) libz.so or others like libssl.so, libcrypto.so every time. I will try to figure out what is actually going on that causes this and figure out if there is any possible way to globally prevent this type of error from happening to Rust crates during cross-compilation while compromising on code cleanliness as little as possible.

Update: Regarding error in build-all.sh -> findomain:

  • Primary form of findomain error (unset OPENSSL_NO_VENDOR):
ld: error: undefined symbol: libandroid_shmget
          >>> referenced by rand_unix.c:445 (providers/implementations/rands/seeding/rand_unix.c:445)
          >>>               libdefault-lib-rand_unix.o:(ossl_pool_acquire_entropy) in archive /home/builder/Findomain/target/debug/deps/libopenssl_sys-e38657f816837a73.rlib
          >>> referenced by rand_unix.c:479 (providers/implementations/rands/seeding/rand_unix.c:479)
          >>>               libdefault-lib-rand_unix.o:(ossl_pool_acquire_entropy) in archive /home/builder/Findomain/target/debug/deps/libopenssl_sys-e38657f816837a73.rlib
# ...(other libandroid_shmxx lines, similar)
ld: error: /data/data/com.termux/files/usr/lib/libssl.so is incompatible with elf64-x86-64

Here is part of a trail I have found that I am attempting to follow: I don't know exactly how to quickly reach the TERMUX_PREFIX state that is a prerequisite for reproducing this error yet, other than that build-all.sh must be run and allowed to proceed through all packages before it reaches findomain (just building the openssl package first in the same prefix is not enough, there is some additional unknown other factor) - but when I get there, that error can actually be bypassed, possibly temporarily and not in a way abstractable to other packages yet, by doing this:

--- /dev/null
+++ b/packages/findomain/bump-headless-chrome-dep-to-newest-stable.patch
@@ -0,0 +1,11 @@
+--- a/Cargo.toml
++++ b/Cargo.toml
+@@ -22,7 +22,7 @@ rand = "0.8.5"
+ postgres = "0.19.7"
+ rayon = "1.7.0"
+ config = { version = "0.11.0", features = ["yaml", "json", "toml", "hjson", "ini"] }
+-headless_chrome = { git = "https://github.com/atroche/rust-headless-chrome", rev = "61ce783806e5d75a03f731330edae6156bb0a2e0" }
++headless_chrome = "1.0.15"
+ addr = "0.15.6"
+ serde_json = "1.0.108"
+ rusolver = { git = "https://github.com/Edu4rdSHL/rusolver", rev = "cf75cafee7c9d0c257c0b5a361441efc4e247e9c" }

I discovered that one aspect of the bug, or a subset of the bug, is fixed or bypassed upstream in https://github.com/atroche/rust-headless-chrome/commit/cd03ad9084c381e4fc089d50ca06008f94b1f45f . What I mean by that is, it seems like any Rust package that uses Cargo to pull in and download and build the source code of dependency Crates, that happens to pull in the headless_chrome crate at any commit before that commit, produces the right alignment of factors to make the error possible. I see that in that commit the change is bumping the auto_generate_cdp dep from 0.3.4 to 0.4.0 and removing some "features" related to native-tls and rustls from the Cargo.toml. That definitely seems like it has a relationship with this error in some way, but I haven't found what would satisfy me as a "true root cause" yet.

docker is a required dependency, possibly for one reason because of crossbuild host OS rootfs pollution

A sort of "fantasy stretch goal" and a natural logical progression of this type of coding, is for me to clean it up so much, that I could optionally take the builder out of the docker container by using sudo mkdir /data && sudo chown -R $(whoami) /data outside of docker, and actually patching all packages such that they don't pollute the build host's actual /usr or /usr/local folders, allowing ./build-package.sh and ./build-all.sh to be run in cross-compiling mode without root! When I checked that a while ago though, it looks like a lot more work to me than just the regular goal stated by this PR, so I won't attempt that first and probably won't have time to go that far.

So, a TL;DR summary of what's going on here is,

  • fixing the $TERMUX_PREFIX/bin folder for endless cross-compiling of all packages
  • fixing the $TERMUX_PREFIX/include folder for endless cross-compiling of all packages
  • TODO: fix the $TERMUX_PREFIX/lib folder for endless cross-compiling of all packages

robertkirkman avatar Oct 16 '24 22:10 robertkirkman

nasmlib/md5c.c:46:10: error: no member named 'buf' in 'struct MD5Context'

I think I recall seeing a very similar error with handbrake as well. Couldn't pin it down there, good to know that this is the cause.

TomJo2000 avatar Oct 17 '24 07:10 TomJo2000

I just remembered we have an open PR for updating the NDK 23c patches. I have a feeling that might have implications for this PR or vise versa.

I just hadn't had the time to review the earlier PR so far.

  • #21499

TomJo2000 avatar Oct 17 '24 07:10 TomJo2000

I just remembered we have an open PR for updating the NDK 23c patches. I have a feeling that might have implications for this PR or vise versa.

I just hadn't had the time to review the earlier PR so far.

* [Update NDK 23c patches #21499](https://github.com/termux/termux-packages/pull/21499)

The purpose of changing -I to -isystem in those files is because , according to the clang documentation, "if there are multiple -I options, these directories are searched in the order they are given before the standard system directories are searched. If the same directory is in the SYSTEM include search paths, for example if also specified with -isystem, the -I option will be ignored"

That means that:

  • before my change, /data/data/com.termux/files/usr/include was searched by the cross-compiler for headers before attempting to detect headers that were also specified with -I within the build system internal to each package being compiled. That was the cause of nasmlib/md5c.c:46:10: error: no member named 'buf' in 'struct MD5Context' (because the md5.h and other headers from libmd package [and other affected packages] were being wrongly included into the builds of all packages that contain their own internal md5.h [or other conflicting headers]).
  • after my change, /data/data/com.termux/files/usr/include is searched by the cross-compiler for headers after all instances of -I passed by the build system internal to the package are searched. I believe this consistently results in more reliable header behavior.
  • As a side note: I'm pretty sure that the reason why the cross-compiler was affected by these errors but non-cross-compiling mode seemed unaffected is because libllvm/clang for non-cross-compiling were built with things like "-DDEFAULT_SYSROOT=$(dirname $TERMUX_PREFIX/)". I think that probably put /data/data/com.termux/files/usr/include into an internal equivalent of -isystem built into the custom clang package for non-cross-compiling. The cross-compiler normally used during cross-compiling most likely wasn't precompiled to do the same thing.

https://github.com/termux/termux-packages/blob/82ee6f86cd7d64ae08d7000cb18d8ad073612a58/packages/libllvm/build.sh#L40

robertkirkman avatar Oct 17 '24 12:10 robertkirkman

@xtkoba one of my changes would permit reverting

  • https://github.com/truboxl/termux-packages/commit/ff4d9a610e1f116e5b62e6dc2acebad280310141

without build failure occurring. Do you think my method looks like a good replacement for yours, and have you ever happened to see any other similar errors or old workarounds that you think might be relevant to this? I might find more as I continue reading.

Validation:

git clone https://github.com/termux/termux-packages.git
cd termux-packages/
gh pr checkout 21835
patch -p1 << 'EOF'
--- a/packages/ghostscript/build.sh
+++ b/packages/ghostscript/build.sh
@@ -31,14 +31,6 @@ termux_step_post_get_source() {
 termux_step_pre_configure() {
 	CPPFLAGS+=" -I${TERMUX_STANDALONE_TOOLCHAIN}/sysroot/usr/include/c++/v1"
 
-	# Workaround for build break caused by `sha2.h` from `libmd` package:
-	if [ -e "$TERMUX_PREFIX/include/sha2.h" ]; then
-		local inc="$TERMUX_PKG_BUILDDIR/_include"
-		mkdir -p "${inc}"
-		ln -sf "$TERMUX_PKG_SRCDIR/base/sha2.h" "${inc}/"
-		CPPFLAGS="-I${inc} ${CPPFLAGS}"
-	fi
-
 	if [[ "${TERMUX_ARCH}" == "aarch64" ]]; then
 		# https://github.com/llvm/llvm-project/issues/74361
 		# NDK r27: clang++: error: unsupported option '-mfpu=' for target 'aarch64-linux-android24'
EOF
scripts/run-docker.sh ./build-package.sh -f libmd ghostscript

And as a reminder, for clarity, I am pretty sure that since it is an alternative implementation of a solution to the same problem, if a PR using code based on https://github.com/termux-play-store/termux-packages/commit/2bef6d4591e1ab5e0ba3e588a5ce19559bdd3d1e eventually comes to this repo, it would also allow reverting that commit.

robertkirkman avatar Oct 19 '24 12:10 robertkirkman

nasmlib/md5c.c:46:10: error: no member named 'buf' in 'struct MD5Context'

I think I recall seeing a very similar error with handbrake as well. Couldn't pin it down there, good to know that this is the cause.

It happens because libmd installs md5.h to $TERMUX_PREFIX/include and there is -I$TERMUX_PREFIX/include in $CPPFLAGS so $TERMUX_PREFIX/include has higher precedence than include directories in build folders.

twaik avatar Oct 29 '24 06:10 twaik

@twaik I decided to continue discussion and documenting some more test results regarding the things like

  • --sysroot (I believe it affects lib + include folders),
  • -isysroot/-isystem (I believe these could affect the include folders only)
  • or -DDEFAULT_SYSROOT (affects a resulting clang program using the compile-time settings of the compiler being built)

here since this might be a more appropriate thread to talk about it, since I mentioned something about the sysroot already above this in an earlier comment here. The clang program that comes inside Termux itself was compiled with -DDEFAULT_SYSROOT, and the cross-compiler that comes from the official NDK was not, so one or two of the subtle differences in behavior between the cross-compiler and the Termux app can be pinpointed to this detail.

I did not completely invent some of the ideas my changes here are based on by myself, intstead, I happened to be observing the behavior of several toolchains and noticing what they do in various situations, and I have sometimes been taking inspiration from or copying what they do in order to fix some errors that happened when building packages in termux-packages.

Gentoo amd64 -> aarch64

Gentoo has a package called crossdev that allowed me to easily install a regular GNU/Linux cross-compiler for other projects that do not involve Android. It was compiled using a GCC equivalent to Clang's -DDEFAULT_SYSROOT, which sets the prefix to the correct folder for cross-compiling Gentoo packages. In recent years, Gentoo has also been adding Clang support to the crossdev package, so that might also be useful for comparison in some cases, though not relevant for every comparison, since right now it only targets GNU/Linux and something else called "aarch64-gentoo-linux-musl".

tacokoneko@CORSAIR ~ $ aarch64-none-linux-gnu-gcc -print-sysroot
/usr/aarch64-none-linux-gnu
tacokoneko@CORSAIR ~ $ ls /usr/aarch64-none-linux-gnu
etc  lib  lib64  sbin  sys-include  usr  var

Termux aarch64

--sysroot=/data/data/com.termux/files is present in the output of a clang debugging info command.

tacokoneko@CORSAIR ~ $ ssh -p 8022 192.168.12.191
[email protected]'s password: 
Welcome to Termux!

Docs:       https://termux.dev/docs
Donate:     https://termux.dev/donate
Community:  https://termux.dev/community

Working with packages:

 - Search:  pkg search <query>
 - Install: pkg install <package>
 - Upgrade: pkg upgrade

Subscribing to additional repositories:

 - Root:    pkg install root-repo
 - X11:     pkg install x11-repo

For fixing any repository issues,
try 'termux-change-repo' command.

Report issues at https://termux.dev/issues
~ $ echo '' | clang -x c - -v 2>&1 | grep -e "--sysroot="
 "/data/data/com.termux/files/usr/bin/ld.lld" --sysroot=/data/data/com.termux/files -EL --fix-cortex-a53-843419 -z now -z relro -z max-page-size=16384 --hash-style=gnu -rpath=/data/data/com.termux/files/usr/lib --eh-frame-hdr -m aarch64linux -pie -dynamic-linker /system/bin/linker64 -o a.out /data/data/com.termux/files/usr/lib/crtbegin_dynamic.o -L/data/data/com.termux/files/usr/lib -L/data/data/com.termux/files/usr/aarch64-linux-android/lib -L/system/lib64 /data/data/com.termux/files/usr/tmp/--88abe0.o /data/data/com.termux/files/usr/lib/clang/19/lib/linux/libclang_rt.builtins-aarch64-android.a -l:libunwind.a -ldl -lc /data/data/com.termux/files/usr/lib/clang/19/lib/linux/libclang_rt.builtins-aarch64-android.a -l:libunwind.a -ldl /data/data/com.termux/files/usr/lib/crtend_android.o
~ $ 
logout
Connection to 192.168.12.191 closed.

Termux package builder Docker image

On the other hand, that argument does not show up in this compiler because it is a prebuilt compiler that comes from a non-Termux source. This means that, technically, to precisely synchronize the exact literal behavior of the cross-compiler and the non-cross-compiler, either -DDEFAULT_SYSROOT must be removed from the build of the clang package, or -DDEFAULT_SYSROOT must be added to the build of a custom built NDK specifically for cross-compiling Termux packages. That is just an example of a very overly invasive solution though, I think it is probably unnecessary to recompile the entire cross-compiler, and a similar result can probably be achieved by setting up the --sysroot argument or similar arguments passed to the current copy of the cross-compiler.

tacokoneko@CORSAIR ~ $ code/termux/electric-boogaloo/termux-packages/scripts/run-docker.sh 
Running container 'termux-package-builder' from image 'temporary-local-termux-package-builder-image'...
builder@bd15ad372c73:~/termux-packages$ echo '' | /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin//aarch64-linux-android-clang -x c - -v 2>&1 | grep -e "--sysroot="
builder@bd15ad372c73:~/termux-packages$ 

robertkirkman avatar Oct 29 '24 08:10 robertkirkman

AFAIK NDK's clang puts --sysroot argument automatically when you pass --target argument.

twaik avatar Oct 29 '24 09:10 twaik

Good idea, though in my command shown, the result is the same as the equivalent with --target because of the script in that folder that i invoked already containing --target. however it might be a good idea for me to save the full result here that I see when I use this command within the docker container:

echo '' |  /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/clang --target=aarch64-linux-android24 -x c - -v
Android (12285214, +pgo, +bolt, +lto, +mlgo, based on r522817b) clang version 18.0.2 (https://android.googlesource.com/toolchain/llvm-project d8003a456d14a3deb8054cdaa529ffbf02d9b262)
Target: aarch64-unknown-linux-android24
Thread model: posix
InstalledDir: /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin
 "/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/clang-18" -cc1 -triple aarch64-unknown-linux-android24 -emit-obj -mrelax-all -dumpdir a- -disable-free -clear-ast-before-backend -disable-llvm-verifier -discard-value-names -main-file-name - -mrelocation-model pic -pic-level 2 -pic-is-pie -mframe-pointer=non-leaf -ffp-contract=on -fno-rounding-math -mconstructor-aliases -funwind-tables=2 -target-cpu generic -target-feature +neon -target-feature +v8a -target-feature +fix-cortex-a53-835769 -target-abi aapcs -debugger-tuning=gdb -fdebug-compilation-dir=/home/builder/termux-packages -v -fcoverage-compilation-dir=/home/builder/termux-packages -resource-dir /home/builder/.termux-build/_cache/android-r27b-api-24-v1/lib/clang/18 -internal-isystem /home/builder/.termux-build/_cache/android-r27b-api-24-v1/lib/clang/18/include -internal-isystem /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/local/include -internal-externc-isystem /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/include/aarch64-linux-android -internal-externc-isystem /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/include -internal-externc-isystem /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/include -ferror-limit 19 -femulated-tls -fno-signed-char -fgnuc-version=4.2.1 -fcolor-diagnostics -target-feature +outline-atomics -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/--eede17.o -x c -
clang -cc1 version 18.0.2 based upon LLVM 18.0.2 default target x86_64-unknown-linux-gnu
ignoring nonexistent directory "/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/local/include"
ignoring nonexistent directory "/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/include"
#include "..." search starts here:
#include <...> search starts here:
 /home/builder/.termux-build/_cache/android-r27b-api-24-v1/lib/clang/18/include
 /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/include/aarch64-linux-android
 /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/include
End of search list.
 "/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/ld.lld" -EL --fix-cortex-a53-843419 -z now -z relro -z max-page-size=4096 --hash-style=gnu --eh-frame-hdr -m aarch64linux -pie -dynamic-linker /system/bin/linker64 -o a.out /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib/aarch64-linux-android/24/crtbegin_dynamic.o -L/home/builder/.termux-build/_cache/android-r27b-api-24-v1/lib/clang/18/lib/linux/aarch64 -L/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib/aarch64-linux-android/24 -L/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib/aarch64-linux-android -L/home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib /tmp/--eede17.o /home/builder/.termux-build/_cache/android-r27b-api-24-v1/lib/clang/18/lib/linux/libclang_rt.builtins-aarch64-android.a -l:libunwind.a -ldl -lc /home/builder/.termux-build/_cache/android-r27b-api-24-v1/lib/clang/18/lib/linux/libclang_rt.builtins-aarch64-android.a -l:libunwind.a -ldl /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib/aarch64-linux-android/24/crtend_android.o
ld.lld: error: undefined symbol: main
>>> referenced by crtbegin.c
>>>               /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib/aarch64-linux-android/24/crtbegin_dynamic.o:(_start_main)
>>> referenced by crtbegin.c
>>>               /home/builder/.termux-build/_cache/android-r27b-api-24-v1/bin/../sysroot/usr/lib/aarch64-linux-android/24/crtbegin_dynamic.o:(_start_main)
clang: error: linker command failed with exit code 1 (use -v to see invocation)

It seems to have set in it a large number of arguments to use a relative path to set the sysroot at the folder above its folder, i.e. /home/builder/.termux-build/_cache/android-r27b-api-24-v1/sysroot, but without any of them being exactly --sysroot.

It is not completely clear to me whether or not using --sysroot, or -isysroot, for example, would risk overwriting or mis-ordering the path this compiler sets for -internal-externc-isystem and causing an error if the headers in that path are needed, but I can continue testing and check whether there seems to be any risk of that.

robertkirkman avatar Oct 29 '24 09:10 robertkirkman

Probably -internal-externc-isystem has the lowest priority and considered to be invoked only by clang itself. Idk.

twaik avatar Oct 29 '24 09:10 twaik

Sorry if it is too much flooding to continue posting notes like this, but I would like to mention since I just reached the proxmark3 package right now, that, while my current -isystem argument fixes a lot of packages without further intervention to modify the package, there are some unique packages that, at least for now, require their own toolchain setup variables, and the way some of them are prevents my $CPPFLAGS from propagating into their builds, meaning that my solution cannot penetrate into their builds to fix their manifestations of the libmd/other headers errors without changing them a little bit as well.

https://github.com/termux/termux-packages/blob/ac069d232bbb71b95ce0f9405ed7abd4eb5b8aa5/packages/proxmark3/build.sh#L13-L17

# this is how to shortcut reproduce libmd-related errors like the one you are familiar with, 
# without having to run the entire build-all.sh from scratch every time. 
# note that it is not always libmd sometimes it is other
# packages, so for similar errors, it's necessary to correctly identify the 
# package that comes first in the pollution order
# before it can be reproduced this way. In this specific case it is libmd.
scripts/run-docker.sh ./build-package.sh -I libmd proxmark3
  • src/cmdflashmem.c:81:5: error: call to undeclared function 'mbedtls_sha1'

However, what seems like an acceptable solution in this particular case is to adjust the package like this,

--- a/packages/proxmark3/build.sh
+++ b/packages/proxmark3/build.sh
@@ -11,8 +11,8 @@ TERMUX_PKG_BUILD_IN_SRC="true"
 TERMUX_PKG_BLACKLISTED_ARCHES="i686, x86_64"
 
 termux_step_post_configure() {
-       export LDLIBS="-L${TERMUX_PREFIX}/lib"
-       export INCLUDES="-I${TERMUX_PREFIX}/include"
+       export LDLIBS="$LDFLAGS"
+       export INCLUDES="$CPPFLAGS"
        TERMUX_PKG_EXTRA_MAKE_ARGS="client CC=$CC CXX=$CXX LD=$CXX cpu_arch=$TERMUX_ARCH SKIPREVENGTEST=1 SKIPQT=1 SKIPPTHREAD=1 SKIPGD=1 PLATFORM=PM3GENERIC"
 }

That allows the command scripts/run-docker.sh ./build-package.sh -I libmd proxmark3 to complete successfully for me, when combined with my other change in termux_setup_toolchain_27b.sh (it also allows me to resume and continue the same run of build-all.sh I was running where the error first appeared for me)

I'll probably commit that to this PR soon as my change for proxmark3.

robertkirkman avatar Oct 29 '24 11:10 robertkirkman

# note that it is not always libmd sometimes it is other
# packages, so for similar errors, it's necessary to correctly identify the 
# package that comes first in the pollution order
# before it can be reproduced this way. In this specific case it is libmd.

It is easy. Make and ninja print the command which fail. You simply add termux's toolchain path to PATH and navigate to build folder like

export PATH="~/.termux-build/_cache/android-r27b-api-24-v1/bin:$PATH"
cd ~/.termux-build/nasm/src # because it builds in SRCDIR...

and invoke the failing clang command with -H in CFLAGS and it will print all the headers it includes with their real paths.

After this you only should the package providing the file with apt-file search /data/data/com.termux/files/usr/include/md5.h (in termux environment).

twaik avatar Oct 29 '24 12:10 twaik

That was really weird, I pushed a change to the dropbear package that seemed very small and fixed the build for me locally, but then in CI it failed, but then when I force pushed the exact same change again it did not fail. The error in CI was really weird, it said /home/runner/work/termux-packages/termux-packages/packages/composer/build.sh: line 14: composer: command not found even though I did not do anything with the composer package. If my change to dropbear causes an intermittent failure in CI that would not be good, so I left a note about it.

This was the run that failed and then when I deleted the commit, re-committed and force pushed without making any code changes it succeeded. https://github.com/termux/termux-packages/actions/runs/11582453980

robertkirkman avatar Oct 29 '24 21:10 robertkirkman

Probably it is time for final review for this PR. We have some issue with updating/uploading a lot of packages at once.

twaik avatar Oct 30 '24 12:10 twaik

Probably it is time for final review for this PR. We have some issue with updating/uploading a lot of packages at once.

Ok, in that case I need to think about how to organize it into several separate PRs I think, that are smaller and more pinpointed to the exact changes they are relevant to. For example, the whole large blue Note at the beginning could be separated out into its own PR, and the code associated with it is ready.

On the other hand, several of the other changes besides that one are not ready yet, so rather than continuing to lump all of my changes of the type that you usually describe as "build-all.sh changes" into this one, I need to separate them into their own specific categories.

One of the reasons I decide to mark my own build-all.sh changes as a draft is because, in my estimations of likely outcomes, I believe the only way to exhaustively detect and solve all present and even some future errors is by running the entire build-all.sh twice over the same $TERMUX_PREFIX. That means that I would consider some of these changes fully tested once my container has already successfully compiled all packages at least once, then reset the build status without resetting the whole prefix and also successfully compiled all packages again a second time.

The reason I believe that is because, it is my understanding that gradual replacement of dependencies in packages over time, as they receive updates, can subtly shift the build order calculated by build-all.sh. That could hypothetically lead to a situation in the future when a package that build-all.sh did not previously compile before another package on the 1st run, could shift in the build order to before, exposing untested edge cases and potentially leading to more errors. I believe that if I run build-all.sh 2 entire times, it will allow me to find and preemptively prevent all of those potential future errors.

I definitely want to try to prevent as many potential errors in my code as possible by fully testing all cross-compiling codepaths in a way that works for any potential build order.

Also if you want to, it is completely OK to copy anything from this PR and put it into your own PRs if there is anything you want to use that you believe is ready and should be in the main repo faster.

robertkirkman avatar Oct 30 '24 13:10 robertkirkman

On the one hand you are right, it is good to commit all changes at once. On the other hand we can get into situation when we have multiple fixes of the same problems from different developers, and all these developers spend significant amount of time to create fixes. (like in the situation with nasm and this PR).

twaik avatar Oct 30 '24 13:10 twaik

Regarding the viability of -isysroot, I just tested these several experiments in termux_setup_toolchain_27b.sh

  • Package used for experiments: odt2txt

  • Experiment 1

export CPPFLAGS+=" -isysroot$(dirname $TERMUX_PREFIX/)"
aarch64-linux-android-clang -O2 -DHAVE_LIBZIP  -isysroot/data/data/com.termux/files  -c -o odt2txt.o odt2txt.c
aarch64-linux-android-clang -O2 -DHAVE_LIBZIP  -isysroot/data/data/com.termux/files  -c -o regex.o regex.c
aarch64-linux-android-clang -O2 -DHAVE_LIBZIP  -isysroot/data/data/com.termux/files  -c -o mem.o mem.c
aarch64-linux-android-clang -O2 -DHAVE_LIBZIP  -isysroot/data/data/com.termux/files  -c -o strbuf.o strbuf.c
odt2txt.c:20:12: fatal error: 'iconv.h' file not found
   20 | #  include <iconv.h>
      |            ^~~~~~~~~
In file included from strbuf.c:11:
./strbuf.h:15:10: fatal error: 'zlib.h' file not found
   15 | #include "zlib.h"
      |          ^~~~~~~~
In file included from regex.c:12:
In file included from ./regex.h:20:
./strbuf.h:15:10: fatal error: 'zlib.h' file not found
   15 | #include "zlib.h"
      |          ^~~~~~~~
1 error generated.
1 error generated.
make: *** [<builtin>: strbuf.o] Error 1
make: *** Waiting for unfinished jobs....
make: *** [<builtin>: regex.o] Error 1
1 error generated.
make: *** [<builtin>: odt2txt.o] Error 1
  • Experiment 2 (same error as above)
export CPPFLAGS+=" -isysroot$TERMUX_PREFIX"
  • Experiment 3 (same error as above)
export CPPFLAGS+=" -isysroot$TERMUX_PREFIX/include"
  • My currently planned change (success)
export CPPFLAGS+=" -isystem$TERMUX_PREFIX/include"

Based on this result, it seems, at least to me, like -isysroot might not be working in this situation as well as -isystem does, or that maybe -isysroot must be used differently or used in a different place in the code, before it can work. I will keep it as -isystem for now and maybe someone will have a good idea in the future about how to replace it with -isysroot if there is a way.

robertkirkman avatar Oct 31 '24 10:10 robertkirkman

If anyone worries that this one is taking too long, do not worry I will finish it, I just stopped uploading it because of the CI concern. When certain parts are ready I will make them each as separate PRs.

There is a medium sized block of code in the luarocks package that was implementing a localized variant of the same symlink workaround I feel could be written in a globalized way, so I am combining it into my version in a way that simultaneously removes all symlink-specific code from the luarocks package, reduces the total number of lines of code dedicated to lua-specific symlink tasks from 12 lines to either 1 single string or no lines at all depending on what I end up deciding, and prevents all of luarocks' possible manifestations of the Exec format error that the current version does not prevent. Also if I don't combine it into my version and remove it from the source package, it would conflict. So the version of termux_step_override_config_scripts.sh seen here will be outdated for a little while.

There's another instance of a preexisting localized variant of the same symlink technique in the git source package. Since twaik wrote a short guide how to find the root cause of the average type of error caused by include folder conflicts, here is a short guide on how to find the root cause of this type of bin folder conflict.

If the package's configure script fails with a message like this, it might be attempting to detect a binary that should be in $TERMUX_PREFIX/bin. You can try searching the code for $TERMUX_PREIFX/bin/ followed by that binary's name to look for older workarounds that followed this pattern.

image

  • A possibly compelling argument in favor of my approach to $TERMUX_PREFIX/bin:

It should enable removing all lines that follow the same pattern that this line follows.

https://github.com/termux/termux-packages/blob/280b51c0b8dc1438e4f62baf0815ddeacd06fa6c/packages/git/build.sh#L80-L81

In the way I do it, the handling of the "build machine" binary symlink happens outside of and isolated from the package changed-file detection section of the code, meaning that a single instance of this symlink can be shared between several packages to serve the exact same purpose, e.g. git and rsnapshot without any worry about the file getting accidentally packaged.

It should be noted that git remains, for now, firmly a "not safe for on-device builds" package, due to the other parts of its build script. My change is a minor cleanup of it for cross-compilation mode.

Short summary explaining some edge cases: EDIT: both of these lists got very long when I tested building every single package, but the 2nd list seems shorter. see the 2nd branch linked below for the current status of my method for bypassing these errors, which could be way too messy, but might become robust if i continue iterating on it. I'll just try it and find out.

  • Binaries that cannot be blindly deleted and must be symlinked to /bin/true to avoid errors:
    • colm
    • protoc*
    • several others
  • Binaries that cannot be symlinked to /bin/true and must be fully deleted to avoid errors:
    • *ccache
    • lua5.4
    • llvm-config (appropriate, deleting it synchronizes well with the preexisting handling for it nearby in the same script)
    • pg_config (appropriate, deleting it synchronizes well with the preexisting handling for it nearby in the same script)
    • others

as I tally those up, I guess what I decide the final fallback behavior should be will change to whichever one has a longer list, since making the fallback behavior match the majority of edge cases will minimize the amount of packages that have to be explicitly named in a string.

robertkirkman avatar Nov 02 '24 08:11 robertkirkman

I do not like to store too many changes only locally without backing them up in the cloud and other backups periodically, because I am afraid of storage corruption, so I am posting a snapshot of my current local changes here, and I will probably copy and paste the changes there that are not already separated into different PRs, into other PRs, once they are ready.

I noticed from previous discussion that the granularity and documentation of any potential changes to the file build-all.sh is a very high priority (as opposed to making too many changes to it at once in a single PR). therefore, do not worry that the WIP build-all.sh I use for testing has too many lines changed in it simultaneously. I will be sure to open a separate, individual, consecutively numbered PR for every cluster of 2-3 lines changed in build-all.sh after it is finished, and will create my own alternative implementations of the buildorder.txt and buildstatus.txt files on an as-needed basis.

robertkirkman avatar Nov 04 '24 22:11 robertkirkman

Probably it will be better to split this into a few small PRs. This PR already has conflicts with main branch.

twaik avatar Feb 07 '25 11:02 twaik

Yes I will, but, after I ran build-all.sh 2x in a row (meaning, attempting to build all packages in a single docker container 1 time and then after that, starting over from the first package and attempting to build all packages in the same docker container a 2nd time without deleting the docker container), I was able to successfully build most packages, but I had to skip about 300 packages due to unresolved prefix pollution, even after the current version of my changes (visible in the fix-crossbuild-prefix-pollution-2 branch)

So, I started prioritizing the packages that can't be built even 1 single time, in a clean docker container, like pypy3, first, since I thought those are the higher-priority ones. A side effect of running this long 2x build-all.sh session using the changes from fix-crossbuild-prefix-pollution-2, was that at the end of it, I also obtained a list of those packages that, at the time, did not build even 1 single time in a clean docker container, and posted it in the issue.

Since then, some of those have now been fixed, so pretty soon I will check all of those packages from the shorter list of those that couldn't build at all, and see if any are still remaining that can't be built, other than those that were moved to disabled-packages (like crypto-monitor was).

robertkirkman avatar Feb 07 '25 12:02 robertkirkman