mojo
mojo copied to clipboard
[BUG]: Mojo is not as fast, as it could be (when printing to stdout is involved)
Bug description
This video (finding prime numbers): https://www.youtube.com/watch?v=D4xX3aS5JjE&pp=ygUTbW9qbyB2cyBnbyB2cyBjb2Rvbg%3D%3D shows that Mojo is fast, but not as fast as other languages like Go or even Codon (compiled Python implementation).
I have performed some benchmarks using similar code as in video:
from time import now
fn is_prime(n: Int) -> Bool:
if n <= 1:
return False
if n <= 3:
return True
if n%2 == 0 or n % 3 == 0:
return False
var i: Int = 5
while i * i <= n:
if n% i == 0 or n% (i+2) ==0:
return False
i += 6
return True
fn find_primes_before_n(n: Int) -> Int:
var total: Int = 0
for i in range(2, n):
if is_prime(i):
total += 1
return total
fn main():
let num = 10_000_000
let start_time: Int = now()
let count: Int = find_primes_before_n(num)
let elapsed_time: Float64 = (now() - start_time) / 1e9
print('Mojo took: ', elapsed_time, 's')
print('total primes before', num, ': ', count)
package main
import (
"fmt"
"time"
)
func isPrime(n int) bool {
if n <= 1 {
return false
}
if n <= 3 {
return true
}
if n%2 == 0 || n%3 == 0 {
return false
}
i := 5
for i*i <= n {
if n%i == 0 || n%(i+2) == 0 {
return false
}
i += 6
}
return true
}
func findPrimesBeforeN(n int) int {
total := 0
for i := 2; i < n; i++ {
if isPrime(i) {
total += 1
}
}
return total
}
func main() {
num := 10_000_000
startTime := time.Now()
count := findPrimesBeforeN(num)
endTime := time.Now()
elapsedTime := endTime.Sub(startTime)
fmt.Println("Go took: ", elapsedTime.Seconds(), "s")
fmt.Println("total primes before", num, ": ", count)
}
Results (Go and Mojo compiled AoT):
Mojo took: 3.7724631670000002 s
total primes before 10000000 : 664579
Go took: 8.895780889 s
total primes before 10000000 : 664579
As you can see in pure computations Mojo is taking 2x less time.
With adding prints: to Mojo:
fn find_primes_before_n(n: Int) -> Int:
var total: Int = 0
for i in range(2, n):
if is_prime(i):
print(i)
total += 1
return total
to Go:
func findPrimesBeforeN(n int) int {
total := 0
for i := 2; i < n; i++ {
if isPrime(i) {
fmt.Println(i)
total += 1
}
}
return total
}
Results (Go and Mojo compiled AoT):
Mojo took: 26.733618268000001 s
total primes before 10000000 : 664579
Go took: 27.621433634 s
total primes before 10000000 : 664579
Mojo is loosing its performance benefits, when stdout is involved. Tested on:
Ubunt 22.10 (VirtualBox)
go1.21.1
Mojo 0.3.1
Any benchmark numbers?
I have attached my benchmarks result in issue.
Thanks @gryznar for reporting. We are aware of this issue and will take steps to resolve this ASAP
@abduld another thing to add here (more important than stdout). In some cases Mojo jitted is much slower than python. For example in sth like that:
fn main():
print("sth1")
print('sth2')
...
print('sth1000')
This script compiled AoT takes 0.04s to run, similar code takes in Python 0.08s, but via:
mojo run script.mojo
it takes over 2s, where 2s takes to compile that.
Thanks @gryznar this is good to know. I believe the fix for the above will make compilation faster as well. Let's take one thing at a time ;)
I hope so, cause compilation time here is alarming
Edit: @abduld Yet another thing, but may be very useful. As it was mentioned here: https://discord.com/channels/1087530497313357884/1098713601386233997/1159447998489710643
in this video: https://youtu.be/kgUXfDpAmGQ?si=ddJZ7Po7xRU6lI7i speakers's compiler can compile 3M lines of code/s. Maybe some tricks introduced there may be ported to Mojo? With such speed startup problems may fade into oblivion 😁
@abduld when you will manage to fix this, could you also look at that: https://github.com/modularml/mojo/issues/1216 ? This is another String related problem, in this case connected with allocations. Root cause may be similar
FYI to anyone interested in improved formatting performance, while we don't have buffered IO yet, I recently implemented a Formatter
abstraction for Mojo that avoids needing intermediate String
allocations when trying to format/print a type, see this recent commit:
[stdlib] feature: Add non-allocating formatting abstraction
which includes more information and examples.
Is there an alternative method for printing? Is it possible to use the Python print function?