mold icon indicating copy to clipboard operation
mold copied to clipboard

regression: mold 1.10.1 produces problematic shared library while 1.3.0 works well

Open daquexian opened this issue 2 years ago • 3 comments

Thanks for your great work!

I have used mold 1.3.0 for a long time and it works like a charm on building oneflow libraries. Recently I upgraded mold to the latest version 1.10.1 and it sometimes failed. The build instructions are:

# Install all dependencies (assume on Ubuntu 20.04)
sudo apt install -y libopenblas-dev nasm autoconf libtool
# Clone and build
git clone https://github.com/Oneflow-Inc/oneflow
cd oneflow
mkdir build
cd build
cmake -C ../cmake/caches/cn/fast/cpu.cmake -DCMAKE_BUILD_TYPE=Debug ..
ninja oneflow_py

The instructions to run oneflow are:

source source.sh
python3 -m oneflow --doctor

And the error messages are:

Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 185, in _run_module_as_main
    mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "/usr/lib/python3.8/runpy.py", line 144, in _get_module_details
    return _get_module_details(pkg_main_name, error)
  File "/usr/lib/python3.8/runpy.py", line 111, in _get_module_details
    __import__(pkg_name)
  File "/home/dev/files/repos/oneflow3/python/oneflow/__init__.py", line 25, in <module>
    import oneflow._oneflow_internal
ImportError: liboneflow.so: ELF load command address/offset not properly aligned

Note that the build does not always fail. If you need to reproduce the error, you may need to delete the shared libraries (all shared libraries in build directory and _oneflow_internal.cpython-3*.so in ../python/oneflow) and re-generate them several times until the error occurs.

daquexian avatar Feb 14 '23 06:02 daquexian

Do you mind if I ask you to do bisect to figure out which version/commit mold started failing to build the file correctly?

rui314 avatar Feb 14 '23 09:02 rui314

I will do it when I have time

daquexian avatar Feb 20 '23 03:02 daquexian

I also met this problem. This problem happens randomly. Sometimes the .so file is OK, but sometimes not.

I guess it's related to multi-threading?

lixin-wei avatar Aug 10 '23 12:08 lixin-wei