LightGBM icon indicating copy to clipboard operation
LightGBM copied to clipboard

LightGBM stuck on fitting with libomp 15.0.7 on new Apple M2 if n_jobs != 1

Open Zahlii opened this issue 1 year ago • 10 comments

from lightgbm import LGBMRegressor
import numpy as np

x = np.random.random((100, 10))
y = x.dot(np.random.random((10,)))
l = LGBMRegressor(n_estimators=1) # doesnt work, with n_jobs=1 it works
l.fit(x, y)

Darwin PC0455 22.3.0 Darwin Kernel Version 22.3.0: Mon Jan 30 20:39:46 PST 2023; root:xnu-8792.81.3~2/RELEASE_ARM64_T6020 arm64

Name: lightgbm Version: 3.3.5

==> libomp: stable 15.0.7 (bottled) [keg-only]

MacOs Ventura 13.2.1

Zahlii avatar Mar 03 '23 19:03 Zahlii

Thanks for using LightGBM. Can you please be more specific about what "doesn't work" means?

Do you get an exception, process crash, something else? Are there any logs you can report?

jameslamb avatar Mar 03 '23 19:03 jameslamb

Hi, sorry for not being more clear: If i omit the n_jobs=1, it will hang indefinitely on the fit line, and I have to sigkill it. When running lightgbm as part of a pytest test suite, I sometimes get a python segmentation fault around the time the LGBM fit occurs.

Zahlii avatar Mar 03 '23 19:03 Zahlii

It's ok. In the future, please provide all the information asked for in the issue template.

How did you install LightGBM? Please be as specific as possible.

jameslamb avatar Mar 03 '23 19:03 jameslamb

brew install miniforge
brew install cmake
brew install gcc
brew install libomp
conda create -n venv-3.9-conda python=3.9.14 -y
conda activate venv-3.9-conda
pip install lightgbm

Some more info; it seems to get stuck here when constructing a booster (from hyperopt): https://github.com/microsoft/LightGBM/blob/v3.3.5/python-package/lightgbm/basic.py#L2639

Setting OMP_NUM_THREADS=1 fixes (both) issues.

Zahlii avatar Mar 03 '23 19:03 Zahlii

Old somewhat related thread: https://github.com/microsoft/LightGBM/issues/4229

Downgrading libomp via homebrew is impossible, as older libomp versions are not compatible with M2.

Zahlii avatar Mar 06 '23 11:03 Zahlii

Issue still persists with libomp 16.0.2

Zahlii avatar Apr 27 '23 14:04 Zahlii

Facing same problem -- I have to set num_threads=1 otherwise kernel died shortly after starting the train job. Interestingly when I set a value >1 (for example 2 or 3), the kernel died after a few seconds while if I do not set any values at all (which default to 0), it almost died immediately.

I originally use homebrew to install lightgbm but switched to the build from github (here) due to the error, before I found this thread and setting num_threads=1.

I can only suspect the homebrew installation will also work with this workaround (haven't tested it)

tszyan-bain avatar Jun 08 '23 07:06 tszyan-bain

@Zahlii we just released LightGBM v4.4.0, with some fixes to macOS support. Could you please check again and see if that resolves the issue?

pip install 'lightgbm>=4.4.0'

I just ran the example you provided and it worked well for me.

  • macOS: 14.4.1 (23E224)
  • chip: M2
  • Python: 3.11.9
  • Python libraries: lightgbm==4.4.0, numpy==1.26.4, scikit-learn==1.15.0
  • OpenMP: libomp: stable 18.1.7 (bottled) [keg-only]

jameslamb avatar Jun 15 '24 05:06 jameslamb

@jameslamb the problem is not fixed for me.

brew info libomp
==> libomp: stable 18.1.8 (bottled) [keg-only]
LLVM's OpenMP runtime library
https://openmp.llvm.org/
Installed
/usr/local/Cellar/libomp/18.1.8 (9 files, 1.7MB)
  Poured from bottle using the formulae.brew.sh API on 2024-06-27 at 01:50:02
From: https://github.com/Homebrew/homebrew-core/blob/HEAD/Formula/lib/libomp.rb
License: MIT
==> Dependencies
Build: cmake ✘, lit ✘
==> Caveats
libomp is keg-only, which means it was not symlinked into /usr/local,
because it can override GCC headers and result in broken builds.

For compilers to find libomp you may need to set:
  export LDFLAGS="-L/usr/local/opt/libomp/lib"
  export CPPFLAGS="-I/usr/local/opt/libomp/include"
==> Analytics
install: 60,255 (30 days), 181,591 (90 days), 515,596 (365 days)
install-on-request: 11,742 (30 days), 35,590 (90 days), 103,479 (365 days)
build-error: 1 (30 days)

My system is almost exactly as yours (M2 and everything)

trantrikien239 avatar Jun 27 '24 06:06 trantrikien239

sad 😭

ok thank you for letting us know, we'll try to investigate soon

jameslamb avatar Jun 27 '24 06:06 jameslamb