mojo Slower Matrix multiplication than numpy

Slower Matrix multiplication than numpy

Open taalhaataahir0102 opened this issue 9 months ago • 0 comments

Bug description

I've tried running the Mojo matmul file available in the repository inside examples directory (https://github.com/modularml/mojo/blob/main/examples/matmul.mojo) The output of the file shows that the most optimized matrix multiplication in Mojo is still 3 to 4 times slower than that in Numpy. Following are the results:

CPU Results

Python:         0.003 GFLOPS
Numpy:        363.124 GFLOPS
Naive:          6.225 GFLOPS   2138.28x Python
Vectorized:    22.350 GFLOPS   7677.58x Python
Parallelized: 102.933 GFLOPS  35358.39x Python
Tiled:        104.982 GFLOPS  36062.43x Python
Unrolled:     107.915 GFLOPS  37069.74x Python

Could someone please explain the performance difference I'm seeing? The most common operation in machine learning is matrix multiplication, and I've noticed that it's slower in Mojo compared to NumPy. NumPy, which is a Python library wrapping optimized C code, is commonly used for AI tasks. Considering that most people use NumPy for these purposes (rather than using matmul written purely in python for ML tasks), what's the motivation behind using Mojo if it's not performing as well as conventional Python code using NumPy?

Steps to reproduce

git clone https://github.com/modularml/mojo.git
cd mojo-main/examples
mojo build matmul.mojo
./matmul

System information

OS: Ubuntu 22.04.3 LTS
Mojo version: mojo 24.3.0 (9882e19d)
Modular version: modular 0.7.4 (df7a9e8b)

May 15 '24 08:05 taalhaataahir0102

mojo mojo copied to clipboard

Slower Matrix multiplication than numpy

Bug description

Steps to reproduce

System information

mojo
mojo copied to clipboard