mold
mold copied to clipboard
regression: mold 1.10.1 produces problematic shared library while 1.3.0 works well
Thanks for your great work!
I have used mold 1.3.0 for a long time and it works like a charm on building oneflow libraries. Recently I upgraded mold to the latest version 1.10.1 and it sometimes failed. The build instructions are:
# Install all dependencies (assume on Ubuntu 20.04)
sudo apt install -y libopenblas-dev nasm autoconf libtool
# Clone and build
git clone https://github.com/Oneflow-Inc/oneflow
cd oneflow
mkdir build
cd build
cmake -C ../cmake/caches/cn/fast/cpu.cmake -DCMAKE_BUILD_TYPE=Debug ..
ninja oneflow_py
The instructions to run oneflow are:
source source.sh
python3 -m oneflow --doctor
And the error messages are:
Traceback (most recent call last):
File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
File "/usr/lib/python3.8/runpy.py", line 144, in _get_module_details
return _get_module_details(pkg_main_name, error)
File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
__import__(pkg_name)
File "/home/dev/files/repos/oneflow3/python/oneflow/__init__.py", line 25, in <module>
import oneflow._oneflow_internal
ImportError: liboneflow.so: ELF load command address/offset not properly aligned
Note that the build does not always fail. If you need to reproduce the error, you may need to delete the shared libraries (all shared libraries in build directory and _oneflow_internal.cpython-3*.so
in ../python/oneflow) and re-generate them several times until the error occurs.
Do you mind if I ask you to do bisect to figure out which version/commit mold started failing to build the file correctly?
I will do it when I have time
I also met this problem. This problem happens randomly. Sometimes the .so file is OK, but sometimes not.
I guess it's related to multi-threading?