mojo
mojo copied to clipboard
Slower Matrix multiplication than numpy
Bug description
I've tried running the Mojo matmul file available in the repository inside examples directory (https://github.com/modularml/mojo/blob/main/examples/matmul.mojo) The output of the file shows that the most optimized matrix multiplication in Mojo is still 3 to 4 times slower than that in Numpy. Following are the results:
CPU Results
Python: 0.003 GFLOPS
Numpy: 363.124 GFLOPS
Naive: 6.225 GFLOPS 2138.28x Python
Vectorized: 22.350 GFLOPS 7677.58x Python
Parallelized: 102.933 GFLOPS 35358.39x Python
Tiled: 104.982 GFLOPS 36062.43x Python
Unrolled: 107.915 GFLOPS 37069.74x Python
Could someone please explain the performance difference I'm seeing? The most common operation in machine learning is matrix multiplication, and I've noticed that it's slower in Mojo compared to NumPy. NumPy, which is a Python library wrapping optimized C code, is commonly used for AI tasks. Considering that most people use NumPy for these purposes (rather than using matmul written purely in python for ML tasks), what's the motivation behind using Mojo if it's not performing as well as conventional Python code using NumPy?
Steps to reproduce
git clone https://github.com/modularml/mojo.git
cd mojo-main/examples
mojo build matmul.mojo
./matmul
System information
OS: Ubuntu 22.04.3 LTS
Mojo version: mojo 24.3.0 (9882e19d)
Modular version: modular 0.7.4 (df7a9e8b)