fix failing build for tensorstore 0.1.72 when using RPATH by passing `$TMPDIR` from host into Bazel sandbox
(created using eb --new-pr)
fix for failing installation when using RPATH linking:
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
src/main/tools/linux-sandbox-pid1.cc:548: "execvp(/tmp/eb-zilp5yc9/tmpr0fpmr63/rpath_wrappers/gcc_wrapper/gcc, 0x1d148c0)": No such file or directory
Target //python/tensorstore:_tensorstore__shared_objects failed to build
@boegelbot please test @ jsc-zen3
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de
PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23139 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23139 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!
- exit code: 0
- output:
Submitted batch job 6894
Test results coming soon (I hope)...
- notification for comment with ID 2985605127 processed
Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).
Test report by @boegelbot FAILED Build succeeded for 0 out of 1 (1 easyconfigs in total) jsczen3c2.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21 See https://gist.github.com/boegelbot/e8082b7c102df5ae5e0ac360ea926a7d for a full test report.
From failed test report:
In file included from external/com_google_protobuf/src/google/protobuf/io/gzip_stream.cc:15:
bazel-out/k8-opt-exec-ST-a828a81199fe/bin/external/com_google_protobuf/src/google/protobuf/io/_virtual_includes/gzip_stream/google/protobuf/io/gzip_stream.h:26:10: fatal error: zlib.h: No such file or directory
26 | #include <zlib.h>
| ^~~~~~~~
compilation terminated.
Test report by @boegel SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) node3515.doduo.os - Linux RHEL 9.4, x86_64, AMD EPYC 7552 48-Core Processor (zen2), Python 3.9.18 See https://gist.github.com/boegel/fa6124e13a5e1ab155eef698b05fec1a for a full test report.
Test report by @akesandgren SUCCESS Build succeeded for 2 out of 2 (1 easyconfigs in total) b-cn1613.hpc2n.umu.se - Linux Ubuntu 22.04, x86_64, AMD EPYC 7313 16-Core Processor, Python 3.10.12 See https://gist.github.com/akesandgren/c9bbda768d0fa550708d4d5aba0699a1 for a full test report.
From failed test report:
In file included from external/com_google_protobuf/src/google/protobuf/io/gzip_stream.cc:15: bazel-out/k8-opt-exec-ST-a828a81199fe/bin/external/com_google_protobuf/src/google/protobuf/io/_virtual_includes/gzip_stream/google/protobuf/io/gzip_stream.h:26:10: fatal error: zlib.h: No such file or directory 26 | #include <zlib.h> | ^~~~~~~~ compilation terminated.
No idea what's going on here...
--copt=-I$EBROOTZLIB/include is being passed via TENSORSTORE_BAZEL_BUILD_OPTIONS, but that doesn't seem to be sufficient to make it pick up the zlib.h provided by the zlib dependency?!
There never was a failing test report from the bot in the original PR for this easyconfig, so the problem is not new:
- #22476
I wonder if the renaming of net_zlib to zlib (we use the former in TENSORSTORE_SYSTEM_LIBS) has something to do with this...
This was only done in tensorstore v0.1.75 though (see https://github.com/google/tensorstore/commit/2a3e7864d767ba702849cc0689ff2584b5c10379), so surely it doesn't affect previous versions... Right?
edit: renaming net_zlib to zlib doesn't help, leads to:
ERROR: no such package '@@net_zlib//': java.io.IOException: Error downloading ..
Test report by @jfgrimm SUCCESS Build succeeded for 1 out of 1 (1 easyconfigs in total) node106.viking2.yor.alces.network - Linux Rocky Linux 8.9, x86_64, AMD EPYC 7643 48-Core Processor, Python 3.6.8 See https://gist.github.com/jfgrimm/9b1552150acdc221bb0b6896b077040d for a full test report.
--subcommands --verbose_failures are passed to Bazel in TF to show the commands being execute/failed which might help diagnosing the issue.
In the TF easyblock we also use --action_env=CPATH=$EBROOTFOO:$EBROOTBAR and for Bazel >= 3.7 duplicate that into --host-action_env
@boegelbot please test @ jsc-zen3
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de
PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23139 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23139 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!
- exit code: 0
- output:
Submitted batch job 8655
Test results coming soon (I hope)...
- notification for comment with ID 3491433595 processed
Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).
Test report by @boegelbot FAILED Build succeeded for 0 out of 1 (total: 4 mins 39 secs) (1 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21 See https://gist.github.com/boegelbot/b8e9b8ad39b7ca3af087a782238bd576 for a full test report.
--copt=-I$EBROOTZLIB/include is being passed via TENSORSTORE_BAZEL_BUILD_OPTIONS, but that doesn't seem to be sufficient to make it pick up the zlib.h provided by the zlib dependency?!
Depends on the environment used by Bazel
In the TF easyblock we also use --action_env=CPATH=$EBROOTFOO:$EBROOTBAR and for Bazel >= 3.7 duplicate that into --host-action_env
As here Bazel 7 is used it might require --host-copt if the failure occurs in the host/exec environment/configuration
@boegelbot please test @ jsc-zen3
@boegel: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de
PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=23139 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_23139 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!
- exit code: 0
- output:
Submitted batch job 9192
Test results coming soon (I hope)...
- notification for comment with ID 3659090380 processed
Message to humans: this is just bookkeeping information for me, it is of no use to you (unless you think I have a bug, which I don't).
Test report by @boegelbot FAILED Build succeeded for 0 out of 1 (total: 3 mins 42 secs) (1 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.6, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21 See https://gist.github.com/boegelbot/e1d6c1db69da71d75411d4f91e7af53d for a full test report.
Test report by @Flamefire SUCCESS Build succeeded for 1 out of 1 (total: 5 mins 4 secs) (1 easyconfigs in total) c144 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 9334 32-Core Processor (zen4), 4 x NVIDIA NVIDIA H100, 580.65.06, Python 3.9.21 See https://gist.github.com/Flamefire/6f77b4d24bf8501bcc26db8b14dbbe5f for a full test report.
Test report by @Flamefire SUCCESS Build succeeded for 6 out of 6 (total: 8 mins 32 secs) (1 easyconfigs in total) i7014 - Linux Rocky Linux 9.6, x86_64, AMD EPYC 7702 64-Core Processor (zen2), Python 3.9.21 See https://gist.github.com/Flamefire/8f4498e3db5159c7c7618c6c3e396c8f for a full test report.
It would be good to see the failing command in the error (again) as it is missing in the log. What do you think about https://github.com/easybuilders/easybuild-framework/pull/5074 ?
But comparing the failing GCC invocations they are literally identical. Running it with -E instead of -c reveals it is including /usr/include/zlib.h here.
https://github.com/easybuilders/easybuild-easyconfigs/pull/24896 contains both fixes.
https://github.com/boegel/easybuild-easyconfigs/pull/100 would merge it to your branch if you want to keep it in this PR