Open3D icon indicating copy to clipboard operation
Open3D copied to clipboard

Dynamic linker build error on Linux ARM64 prevents Linux ARM64 wheels for 0.19 release

Open johnthagen opened this issue 10 months ago • 45 comments

Checklist

Steps to reproduce the issue

ARM64 Linux builds are not available due to a runtime dynamic linker error: "cannot allocate memory in static TLS block"

See: https://github.com/isl-org/Open3D/actions/runs/12604362802/job/35131136413

Error message

____ ERROR collecting test/t/registration/test_transformation_estimation.py ____
ImportError while importing test module '/root/Open3D/python/test/t/registration/test_transformation_estimation.py'.
Hint: make sure your test modules/packages have valid Python names.
Traceback:
../miniconda3/envs/open3d/lib/python3.10/importlib/__init__.py:126: in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
python/test/t/registration/test_transformation_estimation.py:8: in <module>
    import open3d as o3d
../miniconda3/envs/open3d/lib/python3.10/site-packages/open3d/__init__.py:100: in <module>
    from open3d.cpu.pybind import (
E   ImportError: /root/miniconda3/envs/open3d/lib/python3.10/site-packages/open3d/cpu/pybind.cpython-310-aarch64-linux-gnu.so: cannot allocate memory in static TLS block

Open3D, Python and System information

- Linux
- ARM64

Additional information

This prevents building Linux ARM wheels for the 0.19 release

  • #7121
  • #7128

Linux ARM wheels are especially important for macOS developers who run Docker containers locally. Since macOS devices have migrated to ARM64 architecture, Linux containers running on macOS ARM64 require ARM64 Linux wheels.

This prevents us for upgrading to Open3D 0.19, and thus transitively also prevents us from having Python 3.12 support.

johnthagen avatar Jan 08 '25 17:01 johnthagen

+1, still an issue. is there any update here?

fabiannagel avatar Feb 03 '25 16:02 fabiannagel

Not totally sure, but @ssheorey may be working on this in

  • https://github.com/isl-org/Open3D/pull/7134

It's mentioned in the "CI errors left" section

johnthagen avatar Feb 03 '25 16:02 johnthagen

It is getting more pressing for us as well. The rest of the ecosystem is slowly but surely moving to numpy 2.0 and open3d is a straggler for us currently since we are still on 0.18.

rowanG077 avatar Feb 05 '25 00:02 rowanG077

On RPI4 the error

cannot allocate memory in static TLS block

could be resolved by

export XDG_SESSION_TYPE=x11 export LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1 export LD_PRELOAD=~/.your_python_environment_path/open3d/lib/python3.12/site-packages/open3d/cpu/libOpen3D.so.0.19

git tells me that the last commit was

1e7b17438687a0b0c1e5a7187321ac7044afe275

Build fixes for v0.19 (#7128) So it seems to be related to ubuntu versions using wayland. With setting XDG_SESSION_TYPE to x11 the old behavior is used.

nanotuxi avatar Feb 23 '25 07:02 nanotuxi

how is it going?

johnnynunez avatar Mar 05 '25 16:03 johnnynunez

No updates - we need someone with ARM64 hardware to submit a PR to fix this.

ssheorey avatar Mar 06 '25 00:03 ssheorey

No updates - we need someone with ARM64 hardware to submit a PR to fix this.

I have jetson AGX Orin, but you can use github arm runners... I did it on my PR @ssheorey https://github.com/johnnynunez/Open3D/blob/main/.github/workflows/ubuntu-openblas.yml

PR: https://github.com/isl-org/Open3D/pull/7188

name: Ubuntu OpenBLAS
permissions: {}

on:
  workflow_dispatch:
  push:
    branches:
      - main
  pull_request:
    types: [opened, reopened, synchronize]

concurrency:
  group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
  cancel-in-progress: true

env:
  GCE_GPU_CI_SA: ${{ secrets.GCE_GPU_CI_SA }}
  GCE_CLI_GHA_VERSION: '416.0.0'      # Fixed to avoid dependency on API changes

jobs:
  openblas-amd64:
    permissions:
      contents: read
    runs-on: ubuntu-latest
    strategy:
      fail-fast: false
    steps:
      - name: Checkout source code
        uses: actions/checkout@v4
      - name: Maximize build space
        run: |
          source util/ci_utils.sh
          maximize_ubuntu_github_actions_build_space
      - name: Docker build
        run: docker/docker_build.sh openblas-amd64-py310-dev

      - name: Docker test
        run: docker/docker_test.sh openblas-amd64-py310-dev

  openblas-arm64:
    permissions:
      contents: read
    runs-on: ubuntu-24.04-arm
    strategy:
      fail-fast: false
    steps:
      - name: Checkout source code
        uses: actions/checkout@v4
      - name: Maximize build space
        run: |
          source util/ci_utils.sh
          maximize_ubuntu_github_actions_build_space
      - name: Docker build
        run: docker/docker_build.sh openblas-arm64-py310-dev

      - name: Docker test
        run: docker/docker_test.sh openblas-arm64-py310-dev

johnnynunez avatar Mar 06 '25 22:03 johnnynunez

Also vtk now has arm wheels https://gitlab.kitware.com/vtk/vtk/-/merge_requests/11895

johnnynunez avatar Mar 06 '25 22:03 johnnynunez

https://github.com/pytorch/pytorch/issues/2575

johnnynunez avatar Mar 06 '25 23:03 johnnynunez

I'm testing if my fix work or not, I will do a PR if it works

johnnynunez avatar Mar 07 '25 00:03 johnnynunez

Thanks @johnnynunez . Looking forward to the fix. Would also appreciate a PR setting up ARM64 github runners - we rely on emulation at the moment, which is quite slow.

ssheorey avatar Mar 07 '25 01:03 ssheorey

Thanks @johnnynunez . Looking forward to the fix. Would also appreciate a PR setting up ARM64 github runners - we rely on emulation at the moment, which is quite slow.

yes, emulation is very slow, other people use qemu before also... now it is really fast with github arm runners, I ported more than 60 libraries with CI and https://github.com/pypa/cibuildwheel/releases/tag/v2.23.0 and https://github.com/Jimver/cuda-toolkit/releases/tag/v0.2.21 yeah we can't test it really only for cpu arm but in the future will come intel/amd/nvidia arm setup with gpus

johnnynunez avatar Mar 07 '25 01:03 johnnynunez

first PR https://github.com/isl-org/Open3D/pull/7189

johnnynunez avatar Mar 07 '25 01:03 johnnynunez

Ubuntu 20.04 runner close this month

johnnynunez avatar Mar 07 '25 01:03 johnnynunez

Possible error:

#11 2.114     package                    |            build
#11 2.114     ---------------------------|-----------------
#11 2.114     ca-certificates-2025.2.25  |       hd43f75c_0         129 KB
#11 2.114     openssl-3.0.16             |       h998d150_0         5.2 MB
#11 2.114     xz-5.6.4                   |       h998d150_1         573 KB
#11 2.114     ------------------------------------------------------------
#11 2.114                                            Total:         5.9 MB
#11 2.114 
#11 2.114 The following NEW packages will be INSTALLED:
#11 2.114 
#11 2.114   _libgcc_mutex      pkgs/main/linux-aarch64::_libgcc_mutex-0.1-main 
#11 2.114   _openmp_mutex      pkgs/main/linux-aarch64::_openmp_mutex-5.1-51_gnu 
#11 2.114   bzip2              pkgs/main/linux-aarch64::bzip2-1.0.8-h998d150_6 
#11 2.114   ca-certificates    pkgs/main/linux-aarch64::ca-certificates-2025.2.25-hd43f75c_0 
#11 2.114   expat              pkgs/main/linux-aarch64::expat-2.6.4-h419075a_0 
#11 2.114   ld_impl_linux-aar~ pkgs/main/linux-aarch64::ld_impl_linux-aarch64-2.40-h48e3ba3_0 
#11 2.114   libffi             pkgs/main/linux-aarch64::libffi-3.4.4-h419075a_1 
#11 2.114   libgcc-ng          pkgs/main/linux-aarch64::libgcc-ng-11.2.0-h1234567_1 
***#11 2.114   libgomp            pkgs/main/linux-aarch64::libgomp-11.2.0-h1234567_1***
#11 2.114   libstdcxx-ng       pkgs/main/linux-aarch64::libstdcxx-ng-11.2.0-h1234567_1 
#11 2.114   libuuid            pkgs/main/linux-aarch64::libuuid-1.41.5-h998d150_0 
#11 2.114   ncurses            pkgs/main/linux-aarch64::ncurses-6.4-h419075a_0 
#11 2.114   openssl            pkgs/main/linux-aarch64::openssl-3.0.16-h998d150_0 
#11 2.114   pip                pkgs/main/linux-aarch64::pip-25.0-py312hd43f75c_0 
#11 2.114   python             pkgs/main/linux-aarch64::python-3.12.9-h8edadfe_0 
#11 2.114   readline           pkgs/main/linux-aarch64::readline-8.2-h998d150_0 
#11 2.114   setuptools         pkgs/main/linux-aarch64::setuptools-75.8.0-py312hd43f75c_0 
#11 2.114   sqlite             pkgs/main/linux-aarch64::sqlite-3.45.3-h998d150_0 
#11 2.114   tk                 pkgs/main/linux-aarch64::tk-8.6.14-h987d8db_0 
#11 2.114   tzdata             pkgs/main/noarch::tzdata-2025a-h04d1e81_0 
#11 2.114   wheel              pkgs/main/linux-aarch64::wheel-0.45.1-py312hd43f75c_0 
#11 2.114   xz                 pkgs/main/linux-aarch64::xz-5.6.4-h998d150_1 
#11 2.114   zlib               pkgs/main/linux-aarch64::zlib-1.2.13-h998d150_1 

Anaconda installs its own version of libgomp, which in some cases is compiled without the -fPIC flag and uses static TLS. This can conflict with other libraries and cause the TLS mapping error.

johnnynunez avatar Mar 07 '25 01:03 johnnynunez

We can switch to pyenv to install the specific Python version - this uses the OS libraries directly and does not cause conflicts like this.

ssheorey avatar Mar 07 '25 06:03 ssheorey

there are errors with pyenv

#22 1575.3 [ 94%] Building CXX object cpp/pybind/CMakeFiles/pybind.dir/docstring.cpp.o
#22 1575.4 [ 94%] Building CXX object cpp/pybind/CMakeFiles/pybind.dir/open3d_pybind.cpp.o
#22 1576.5 [ 94%] Building CXX object cpp/pybind/CMakeFiles/pybind.dir/pybind_utils.cpp.o
#22 1583.4 [ 94%] Linking CXX executable ../../bin/tests
#22 1588.6 /usr/bin/ld: ../../curl/lib/libcurl.a(idn.c.o): in function `Curl_idn_decode':
#22 1588.6 idn.c:(.text+0x58): undefined reference to `idn2_check_version'
#22 1588.6 /usr/bin/ld: idn.c:(.text+0x74): undefined reference to `idn2_lookup_ul'
#22 1588.6 /usr/bin/ld: idn.c:(.text+0xa0): undefined reference to `idn2_free'
#22 1588.6 /usr/bin/ld: idn.c:(.text+0xdc): undefined reference to `idn2_lookup_ul'
#22 1588.6 /usr/bin/ld: ../../curl/lib/libcurl.a(idn.c.o): in function `Curl_free_idnconverted_hostname':
#22 1588.6 idn.c:(.text+0x120): undefined reference to `idn2_free'
#22 1588.6 /usr/bin/ld: ../../curl/lib/libcurl.a(idn.c.o): in function `Curl_idnconvert_hostname':
#22 1588.6 idn.c:(.text+0x1bc): undefined reference to `idn2_check_version'
#22 1588.6 /usr/bin/ld: idn.c:(.text+0x1d4): undefined reference to `idn2_lookup_ul'
#22 1588.6 /usr/bin/ld: idn.c:(.text+0x200): undefined reference to `idn2_lookup_ul'
#22 1588.6 collect2: error: ld returned 1 exit status
#22 1588.6 make[2]: *** [cpp/tests/CMakeFiles/tests.dir/build.make:2036: bin/tests] Error 1
#22 1588.6 make[1]: *** [CMakeFiles/Makefile2:4034: cpp/tests/CMakeFiles/tests.dir/all] Error 2
#22 1607.9 [ 94%] Linking CXX shared module ../../lib/Release/Python/cpu/pybind.cpython-312-aarch64-linux-gnu.so
#22 1657.6 [ 94%] Built target pybind
#22 1657.6 make: *** [Makefile:156: all] Error 2
#22 ERROR: process "/bin/bash -c mkdir build  && cd build  && cmake     -DBUILD_UNIT_TESTS=ON     -DCMAKE_BUILD_TYPE=Release     -DCMAKE_INSTALL_PREFIX=~/open3d_install     -DDEVELOPER_BUILD=${DEVELOPER_BUILD}     ..  && make -j$(nproc)  && make install-pip-package -j$(nproc)  && make install -j$(nproc)" did not complete successfully: exit code: 2
------
 > [17/18] RUN mkdir build  && cd build  && cmake     -DBUILD_UNIT_TESTS=ON     -DCMAKE_BUILD_TYPE=Release     -DCMAKE_INSTALL_PREFIX=~/open3d_install     -DDEVELOPER_BUILD=OFF     ..  && make -j$(nproc)  && make install-pip-package -j$(nproc)  && make install -j$(nproc):
1588.6 /usr/bin/ld: ../../curl/lib/libcurl.a(idn.c.o): in function `Curl_idnconvert_hostname':
1588.6 idn.c:(.text+0x1bc): undefined reference to `idn2_check_version'
1588.6 /usr/bin/ld: idn.c:(.text+0x1d4): undefined reference to `idn2_lookup_ul'
1588.6 /usr/bin/ld: idn.c:(.text+0x200): undefined reference to `idn2_lookup_ul'
1588.6 collect2: error: ld returned 1 exit status
1588.6 make[2]: *** [cpp/tests/CMakeFiles/tests.dir/build.make:2036: bin/tests] Error 1
1588.6 make[1]: *** [CMakeFiles/Makefile2:4034: cpp/tests/CMakeFiles/tests.dir/all] Error 2
1607.9 [ 94%] Linking CXX shared module ../../lib/Release/Python/cpu/pybind.cpython-312-aarch64-linux-gnu.so
1657.6 [ 94%] Built target pybind
1657.6 make: *** [Makefile:156: all] Error 2
------
Dockerfile.openblas:93
--------------------
  92 |     # Build Open3D: create build directory, run CMake configuration, build, and install
  93 | >>> RUN mkdir build \
  94 | >>>  && cd build \
  95 | >>>  && cmake \
  96 | >>>     -DBUILD_UNIT_TESTS=ON \
  97 | >>>     -DCMAKE_BUILD_TYPE=Release \
  98 | >>>     -DCMAKE_INSTALL_PREFIX=~/open3d_install \
  99 | >>>     -DDEVELOPER_BUILD=${DEVELOPER_BUILD} \
 100 | >>>     .. \
 101 | >>>  && make -j$(nproc) \
 102 | >>>  && make install-pip-package -j$(nproc) \
 103 | >>>  && make install -j$(nproc)
 104 |     RUN cp build/lib/python_package/pip_package/*.whl /
--------------------
ERROR: failed to solve: process "/bin/bash -c mkdir build  && cd build  && cmake     -DBUILD_UNIT_TESTS=ON     -DCMAKE_BUILD_TYPE=Release     -DCMAKE_INSTALL_PREFIX=~/open3d_install     -DDEVELOPER_BUILD=${DEVELOPER_BUILD}     ..  && make -j$(nproc)  && make install-pip-package -j$(nproc)  && make install -j$(nproc)" did not complete successfully: exit code: 2
johnny@johnny-jetson:~/Projects/Open3D/docker$ >


johnnynunez avatar Mar 07 '25 10:03 johnnynunez

You could also try installing Python on Ubuntu using the dead snakes PPA

  • https://launchpad.net/~deadsnakes/+archive/ubuntu/ppa

johnthagen avatar Mar 07 '25 11:03 johnthagen

@johnthagen @ssheorey it stopped on 91% now, error about libcurl library

logs_35346254383.zip

johnnynunez avatar Mar 07 '25 14:03 johnnynunez

Seems that are issues with:

  • libcurl
  • idn2
  • openssl

johnnynunez avatar Mar 07 '25 15:03 johnnynunez

I check and -fpic and dynamic is ON

set(CMAKE_POSITION_INDEPENDENT_CODE ON)
if (LINUX_AARCH64)
# Fix for ImportError: ... /pybind.cpython-310-aarch64-linux-gnu.so: cannot allocate memory in static TLS block
# https://bugs.launchpad.net/ubuntu/+source/mysql-8.0/+bug/1889851
    add_compile_options("-ftls-model=global-dynamic")
endif()

johnnynunez avatar Mar 07 '25 16:03 johnnynunez

and if we disable OPENMP for aarch64? I mean, it uses tbb and pthreads(original from system) not should affect on performance

johnnynunez avatar Mar 07 '25 16:03 johnnynunez

and if we disable OPENMP for aarch64?

In my opinion (as a user, not maintainer), it would be a much better situation to have Linux ARM64 wheels that support most of what Open3D does rather than no wheels at all.

johnthagen avatar Mar 07 '25 19:03 johnthagen

and if we disable OPENMP for aarch64?

In my opinion (as a user, not maintainer), it would be a much better situation to have Linux ARM64 wheels that support most of what Open3D does rather than no wheels at all.

I mean, same situation is in opencv with arm… Seems a bug with glibc and openmp, but still has tbb and pthreads

johnnynunez avatar Mar 07 '25 20:03 johnnynunez

I've disabled openmp and not works. I have to investigate more

johnnynunez avatar Mar 08 '25 23:03 johnnynunez

Now I have gh200, faster to investigate this

johnnynunez avatar Apr 19 '25 21:04 johnnynunez

I've finally checked. I find that this commit start fail on arm64: https://github.com/isl-org/Open3D/commit/e86fcb38ddb5cb5e3d77c53cab234ddd687fbcfd

johnnynunez avatar Apr 20 '25 08:04 johnnynunez

Perhaps the regression was related to CUDA? The bug is mentioned in the commit message near:

...
* CUDA latest 11.8 to support PyTorch 2.0
Fix CUDAARCHS in Windows CI, since no GPU present.
ARM Linux: Fix for "cannot allocate memory in TLS blc
...

Perhaps @ssheorey might remember.

johnthagen avatar Apr 23 '25 12:04 johnthagen

i can check due i have gh200 now

johnnynunez avatar Apr 25 '25 23:04 johnnynunez

I think that I found the problem: When open3d upgrade to 20.04 start fails. The problem is:

https://bugzilla.redhat.com/show_bug.cgi?id=1722181 says:

the fix in glibc is allowing the installer to run correctly without the LD_PRELOAD workaround

it was one of the reasons people switched x86_64 wheels to manylinux_2_28

distro / glibc | constant that sets the surplus static-TLS area¹ | default size on AArch64 | Δ relative to 20.04 -- | -- | -- | -- Ubuntu 20.04 LTS(glibc 2.31–0ubuntu9) | TLS_STATIC_SURPLUS = 192 × 3 + 144 × 4 + 512 | 1 664 bytes | baseline Ubuntu 22.04 LTS(glibc 2.35) | TLS_STATIC_SURPLUS = 192 × 4 + 144 × 5 + 1 024 (see patch) | 2 512 bytes | ≈ + 848 bytes (+ 51 %)

johnnynunez avatar Apr 26 '25 00:04 johnnynunez