pyTsetlinMachine icon indicating copy to clipboard operation
pyTsetlinMachine copied to clipboard

Regression Demo does not converge on Apple M1/ARM Mac

Open schuderer opened this issue 2 years ago • 3 comments

Thank you for providing this library.

I have been using the separate RegressionTsetlinMachine with some success for some experiments but needed to reduce training time. After some tests with pyTsetlinMachine and tmu, decided to swap out the library with pyTsetlinMachine for a speed increase of several magnitudes. Oddly enough, my use case did not converge with pyTsetlinMachine's RegressionTsetlinMachine any more (using the same parameters as before: clauses=10000, T=10000, s=2.5).

There are no errors or warnings.

Digging deeper, I found that running RegressionDemo.py only reaches an RMSD of 1.21 (instead of the RMSD given in the readme). MAD is 1.0. As a sanity check, I also tried a synthetic example consisting of both binary feature vectors of 80 ones for the associated target value of 300, as well as all-zero features for the target of 0. The predictions look pretty random, and MAD is 150. This is with the example's unmodified hyperparameters (the demo uses 80 feature bits, too).

When a colleague of mine tried it on his VM on intel, both the Regression Demo as well as the sanity check converged as expected, same as when I tried it out on a Windows 10 laptop (thanks to the steps described in #7).

My system is the only Apple M1 system I have available for testing. If anyone is able to try it on a comparable system, this would help in narrowing it down. I feel that it might have something to do with the architecture (ARM vs Intel). My Python interpreter and C compiler both are arm64 native.

Macbook Pro M1 (arm64, 2021), macOS 12.4
Python 3.8.12 (arm64 native)
gcc: Apple clang version 13.1.6 arm64-apple-darwin21.5.0

Things I've tried out so far without any success:

  • Installing from the repo and a cloned local copy instead of PyPI
  • Trying the Windows 10 installation workaround (#7) on my Mac as well
  • Removing the -O3 and -ffast-math optimization switches from the makefile's gcc calls
  • Building a wheel locally and installing
  • Pruning pip cache before reinstalling
  • Trying different sets of hyperparameters

schuderer avatar Jul 01 '22 13:07 schuderer

Hi @schuderer! Did you manage to solve the problem? Just upgraded to MacBook Pro M1 Max and tested the PyTsetlinMachine RegressionDemo. I am running: Python 3.10.7 (main, Sep 14 2022, 22:38:23) [Clang 14.0.0 (clang-1400.0.29.102)] gcc Apple clang version 14.0.0 (clang-1400.0.29.102).

It runs as expected on my side: python3.10 ./examples/RegressionDemo.py

RMSD over 25 runs:

#1 RMSD: 0.61 +/- 0.00 (7.43s) #2 RMSD: 0.61 +/- 0.00 (7.56s) #3 RMSD: 0.60 +/- 0.00 (7.60s) #4 RMSD: 0.61 +/- 0.00 (7.41s) #5 RMSD: 0.61 +/- 0.01 (7.43s) ...

olegranmo avatar Oct 13 '22 09:10 olegranmo

Hi @olegranmo, thank you very much for testing it out. I'm afraid that I don't remember whether I got it to work eventually. I've since moved on to your TMU project's regression implementation which is still faster than the original RegressionTsetlinMachine. Fortunately, I got additional resources to carry out a parameter search, so TMU was also acceptable. But it's good to hear that pyTsetlinMachine should work now, too! Still, I'm wondering why the newer TMU implementation appears to be slower than the pyTsetlinMachine.

schuderer avatar Oct 13 '22 10:10 schuderer

Great it worked out! In the TMU implementation, a larger part of the code is moved over to Python/Numpy. Only the clauses are evaluated and updated in CUDA/C. Then it is easier to create and experiment with new kinds of architectures. However, for smaller number of features/clauses the Python overhead becomes a bottleneck. At some point, I plan to move functionality back into C/CUDA, or alternatively, use Numba to optimize the Python side. :-)

olegranmo avatar Oct 13 '22 11:10 olegranmo