yggdrasil-decision-forests icon indicating copy to clipboard operation
yggdrasil-decision-forests copied to clipboard

On MacOSX, Mac M Hardware (ARM), a segmentation fault happened with YDF when pyarrow is installed

Open lusis-ai opened this issue 11 months ago • 8 comments

Setup : MacOSX 13 or 14, Mac M hardware

Prerequisite : Install miniforge3

% conda create --name ydfpandasissue
% conda activate ydfpandasissue
% conda install python=3.10
% conda install pandas
% pip install ydf-0.2.0-cp310-cp310-macosx_13_0_arm64.whl

When running this program (ydf_test.py), it works.

import ydf
import pandas as pd
import numpy as np

dataset = {
    "x1": np.array([0, 0, 0, 1, 1, 1]),
    "x2": np.array([1, 1, 0, 0, 1, 1]),
    "y": np.array([0, 0, 0, 0, 1, 1]),
}

model = ydf.CartLearner(label="y", min_examples=1, task=ydf.Task.CLASSIFICATION).train(dataset)
print(model.describe())

Now install pyarrow from conda or pip the result is the same: it fails Only the error message is different.

% conda install pyarrow
% python ydf_test.py
zsh: segmentation fault  python ydf_test.py
% conda uninstall pyarrow
% pip install pyarrow
% python ydf_test.py
libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
zsh: abort      python ydf_test.py

Note that pyarrow is mandatory when we work on big tabular dataset stored in parquet files.

lusis-ai avatar Mar 07 '24 15:03 lusis-ai

Thank you for the detailed report, I will have a look

rstz avatar Mar 07 '24 15:03 rstz

Similar issue happened with tensorflow_decision_forests.

After installing tensorflow and tensorflow_decision_forests from pip (as tfdf for ARM on conda is not available), in the same config as above, the following error happened (here python terminal):

Python 3.10.13 | packaged by conda-forge | (main, Dec 23 2023, 15:35:25) [Clang 16.0.6 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow_decision_forests as tfdf
>>> import ydf
[mutex.cc : 453] RAW: Lock blocking 0x600001892898   @

lusis-ai avatar Mar 07 '24 16:03 lusis-ai

I had the same issue, but i failed to make the connection to ydf. As a temporary workaround, i switched to fastparquet, which is the other library pandas supports to read parquet files. This one works fine for me.

mowoe avatar Mar 08 '24 12:03 mowoe

But the issue is still there when importing tensorflow or tensorflow_decision_forests.

We have utils libs importing tensorflow so it makes it crash with :

libc++abi: terminating due to uncaught exception of type std::__1::system_error: mutex lock failed: Invalid argument
zsh: abort      python ydf_test.py

lusis-ai avatar Mar 08 '24 16:03 lusis-ai

python -c 'import pandas;import tensorflow;import tensorflow_decision_forests'

works fine in my venv which only has fastparquet installed and not pyarrow

mowoe avatar Mar 08 '24 18:03 mowoe

To give some preliminary findings from the crash logs:

  • A protobuf incompatibility is the source of the crash. Both (conda-installed) Pyarrow and (pip-installed) YDF depend on protobuf.
  • AFAICT, pyarrow links dynamically against libprotobuf 25.2. Unfortunately, there's a symbol overlap, where libprotobuf calls ydf. ydf has protobuf24.3 statically linked. Since the two versions don't match, there's a crash
  • There has to be an easy way to prevent this mess during compilation (suggestions anyone?) - I imagine one would just have to instruct ydf not to expose protobuf symbols that confuse libprotobuf
  • Very dirty solution (attached, UNTESTED very experimental): If you compile ydf with protobuf 25.2 it actually seems to work. But we obviously cannot keep ydf in sync with every protobuf that's out there. ydf-0.2.0-cp310-cp310-macosx_14_0_arm64.whl.zip
  • I might also be that this is a conda-specific type of issue. I'd strongly prefer not maintaining a conda package alongside a pip package at this point though
  • I'll have to look into TF-DF separately

rstz avatar Mar 08 '24 19:03 rstz

Nice, we manage package consistency with conda but inside a conda env we can also install packages with pip when needed. Tomorrow I will try by using pip only to check.

Thanks for your help

lusis-ai avatar Mar 08 '24 20:03 lusis-ai

Hi,

For the issue with pyarrow, thanks to your indication it's resolved just by forcing protobuf to 4.24.3, even installing protobuf with conda is ok and now it works.

The strange thing is that, even if ydf has protobuf24.3 statically linked, pip install the very last 4.25.3 version. There is no strict requirement to force the protobuf version to 4.24.3 when installing ydf from pip, just protobuf>=3.14, maybe it should be modified ?

Anyway, by doing it manually, it works now.

Not the same issue for TF-DF, it still crash, so I cannot use model.to_tensorflow_saved_model(path) function.

lusis-ai avatar Mar 09 '24 09:03 lusis-ai

I believe this issue has finally been solved for good with version 0.7.0. Closing this but feel free to re-open if there are still issues.

rstz avatar Aug 21 '24 19:08 rstz