mlx
mlx copied to clipboard
[BUG] bus error
Describe the bug
When attempting to run tests, either for C++ or Python I get a generic bus error
.
It may as well be a local problem on my machine but it'd be great to rule that out as I've tried to re-install from scratch and the problem persists.
To Reproduce
C++:
cd build && make test
Python:
python -m unittest discover python/tests
Expected behavior A clear and concise description of what you expected to happen.
Desktop:
- OS Version: MacOS 14.2.1
- Version: 0.0.7 (currently at 6ea6b42)
Additional context
- Xcode: 15.1
-
cmake
: 3.28.1 - Chip: M3 Pro
🤔 I do not see a bus error but I have some differences in my setup:
- Tried the commit you pointed (same)
- OS 14.2.1 (same as you)
- Chip M1 Max (possibly something odd with M3 pro..)
- Xcode 15.1 (same as you)
- cmake version 3.24.2 (seems unlikely to be an issue)
I wiped the build and tried with cmake 3.28.1 and I still see no issue.
So either there is something else is off with your env or we have an issue with m3 pro.
Ok, interesting.
Let's see if someone else with an M3 Pro can give it a try.
Is it just the tests? Are you able to import mlx and do any ops?
What if you try from PyPi? Does that package work for you?
Honestly i would be very surprised if it had to do with the M3 as I know MLX is being used regularly there (unless we did something recently since release 0.0.7).
Yes, I am. I'm trying to figure out whether there's some operation that's causing the problem with no luck yet.
So the tril
op seems to consistently cause a bus error. I tested it manually and also all the tests up to here work https://github.com/ml-explore/mlx/blob/449b43762e3f970576f054e54066123c0f37246e/python/tests/test_ops.py#L331 this one is the first to cause a bus error (in test_ops.py
).
Simply running
import mlx.core as mx
mx.tril(mx.zeros([1]))
results in a bus error.
Wow that is so strange..
I get this: ValueError: [tril] array must be at least 2-D
Is it just that case? Can you do mx.tril(mx.ones((10, 10)))
?
Also what's your Pybind11 version? python -c "import pybind11; print(pybind11.__version__)"
Is it just that case? Can you do
mx.tril(mx.ones((10, 10)))
?
That works actually! Looks like the problem may be with raising exceptions/throwing errors - the test I pointed out above is the first one to expect an exception.
python -c "import pybind11; print(pybind11.__version__)"
2.11.1
But you see the same problem from C++ only right? So it seems unlikely to be a binding issue 🤔
Can you also see the output of:
-
uname -m
(should be arm) - How did you install
cmake
? Via brewwhich cmake
?
I have seen some funny issues when using Rosetta to do the x86 translation so want to be sure everything is arm / native.
But you see the same problem from C++ only right? So it seems unlikely to be a binding issue 🤔
Yes that's the weird part.
Can you also see the output of:
uname -m
arm64
- How did you install
cmake
? Via brewwhich cmake
?
via brew: /opt/homebrew/bin/cmake
FWIW this is what I get when running make test
Test project <path_to_mlx>/build
Start 1: tests
1/1 Test #1: tests ............................Bus error***Exception: 0.04 sec
0% tests passed, 1 tests failed out of 1
Total Test time (real) = 0.05 sec
The following tests FAILED:
1 - tests (Bus error)
Errors while running CTest
Output from these tests are in: <path_to_mlx>/build/Testing/Temporary/LastTest.log
Use "--rerun-failed --output-on-failure" to re-run the failed cases verbosely.
make: *** [test] Error 8
I get this:
ValueError: [tril] array must be at least 2-D
I am also getting this error. I am working on M2.
Thanks @gboduljak. I'm admittedly a little stumped by this one. Would be good to check on an M3 Pro to verify that has nothing to do with it.. (@jagrit06 might be able to help there).
FYI I tried to uninstall and reinstall Xcode, cmake and gcc and then clone the clean repo, build and test
git clone [email protected]:ml-explore/mlx.git mlx && cd mlx
mkdir -p build && cd build
cmake .. && make -j
make test
Still getting the same bus error reported above.
There is a simple test case in this thread. Maybe if you have a second you can play around with it and see if it also gives you a bus error. That would strongly suggest something is borked in your environment.
There is a simple test case in this thread. Maybe if you have a second you can play around with it and see if it also gives you a bus error. That would strongly suggest something is borked in your environment.
That actually works fine for me.
For reproducibility: I tried compiling via
/usr/bin/clang++ -shared -stdlib=libc++ -std=c++17 -undefined dynamic_lookup $(python3 -m pybind11 --includes) test.cpp -o test$(python3-config --extension-suffix)
and
c++ -shared -std=c++17 -undefined dynamic_lookup $(python3 -m pybind11 --includes) test.cpp -o test$(python3-config --extension-suffix)
And I always get the correct output when running the python test.
Note: with mlx
I get a bus error not a segmentation fault.
@jagrit06 ran on an M3 Max and cannot repro the Bus error (there were some numerical issues which are fixed in #401) but unrelated to the bus error :\
Thanks @awni, then unless there's some unlikely difference between M3 Pro and Max I guess this is a local issue. I'll continue to explore what's the cause and post any solution in case someone else faces the same problem at some point.
I am exceedingly curious..
@francescofarina can we close this? Were you ever able to build / run the tests locally?
Yes! I'm still not sure what the problem was but it works well now.
Glad to hear it!