codon
codon copied to clipboard
136x slower than Numba
I installed Codon 0.15.5 and codon-jit 0.1.3 on Ubuntu 22.04.2 with Python 3.10.6. Comparing this script with Numba variant (commented line):
import codon
import numba as nb
import numpy as np
import timeit as ti
from math import atan2, sqrt
@codon.jit(pyvars=['atan2', 'sqrt'])
# @nb.njit(fastmath=True, locals=dict(w=nb.uint32))
def getGradAngle(im, grad, angle, w):
for i in range(im.size-w-1):
dx = im[i+w+1]-im[i]
dy = im[i+w]-im[i+1]
grad[i] = sqrt(dx**2+dy**2)
angle[i] = atan2(dy, dx)
w, h = 640, 480
im = np.random.rand(w*h).astype('f4')
grad = np.zeros_like(im)
angle = np.zeros_like(im)
fun = f'getGradAngle(im, grad, angle, w)'
t = 1000 * np.array(ti.repeat(stmt=fun, setup=fun, globals=globals(), number=1, repeat=10))
print(f'{fun}: {np.amin(t):6.3f}ms {np.median(t):6.3f}ms {np.sum(grad)}')
I'm getting Codon variant output:
getGradAngle(im, grad, angle, w): 1139.133ms 1145.087ms 159771.875
and Numba variant output:
getGradAngle(im, grad, angle, w): 8.377ms 8.389ms 159693.890625
Plain Python variant output:
getGradAngle(im, grad, angle, w): 1129.810ms 1135.429ms 159641.578125
Why is the Codon variant slower than even the plain Python?
Would you mind trying without pyvars?
It will not compile without pyvars:
codon.codon_jit.JITError: /home/paul/st-python/bench/gh-2.py:15:15: name 'sqrt' is not defined
/home/paul/st-python/bench/gh-2.py:16:16: name 'atan2' is not defined
Hi @pauljurczak -- Codon doesn't support NumPy yet (we're working on a Codon-native NumPy that's fully compiled), so that function is just operating on the Python objects within Codon, meaning there won't be any performance improvement.
FYI you can also avoid the pyvars
by importing the math
functions inside the @codon.jit
function (the JIT'd functions are compiled in their own environment so they don't see external imports).
importing the math functions inside the @codon.jit function
I did that:
@codon.jit
def getGradAngle(im, grad, angle, w):
from math import atan2, sqrt
for i in range(im.size-w-1):
dx = im[i+w+1]-im[i]
dy = im[i+w]-im[i+1]
grad[i] = sqrt(dx**2+dy**2)
angle[i] = atan2(dy, dx)
Performance improved just a bit, but is still poor, i.e. 127x slower than Numba:
getGradAngle(im, grad, angle, w): 1064.479ms 1070.543ms 159669.859375
Yes, again this is expected at the moment until we add Codon-native NumPy. The NumPy arrays are being passed to the function as Python objects since there's no ndarray
type in Codon, so the JIT'd code is just using the same CPython API calls that Python is using under the hood, leading to the same performance.
One possible workaround in the meantime is to use lists instead. The discussion here might also be of interest: https://github.com/exaloop/codon/discussions/228.
One possible workaround in the meantime is to use lists instead.
I took @pauljurczak's example and made a class. How can Codon return the sum of the self.grad
list? 0.0 is incorrect.
import codon
import numpy as np
import timeit as ti
@codon.convert
class Foo:
__slots__ = 'w', 'h', 'im', 'grad', 'angle'
def __init__(self, w, h):
im = np.random.rand(w*h).astype('f4')
self.w = w
self.h = h
self.im = im.tolist()
self.grad = np.zeros_like(im).tolist()
self.angle = np.zeros_like(im).tolist()
@codon.jit
def getGradAngle(self):
from math import atan2, sqrt
for i in range(len(self.im)-self.w-1):
dx = self.im[i+self.w+1]-self.im[i]
dy = self.im[i+self.w]-self.im[i+1]
self.grad[i] = sqrt(dx**2+dy**2)
self.angle[i] = atan2(dy, dx)
@codon.jit
def getSum(self) -> float:
return sum(self.grad)
foo = Foo(640, 480)
fun = f'foo.getGradAngle()'
t = 1000 * np.array(ti.repeat(stmt=fun, setup=fun, globals=globals(), number=1, repeat=10))
print(f'{fun}: {np.amin(t):6.3f}ms {np.median(t):6.3f}ms {foo.getSum()}')
Running:
# Python (@codon.jit lines commented out)
time python demo.py
foo.getGradAngle(): 82.440ms 82.886ms 159764.3314357752
real 0m3.247s
user 0m3.140s
sys 0m0.101s
# codon.jit
time python demo.py
foo.getGradAngle(): 16.248ms 16.366ms 0.0
real 0m2.516s
user 0m2.510s
sys 0m0.131s
The getSum
method is also jitted and thought this would work. I'm running develop branch 725003c.
Next, I tried making a demo.codon
demonstration.
from python import numpy as np
from math import atan2, sqrt
from time import time
@tuple
class Foo:
w: int
h: int
im: List[float]
grad: List[float]
angle: List[float]
def __new__(w: int, h: int):
im = np.random.rand(w*h).astype('f4')
grad = np.zeros_like(im)
angle = np.zeros_like(im)
return Foo(w, h, im.tolist(), grad.tolist(), angle.tolist())
def getGradAngle(self):
for i in range(len(self.im)-self.w-1):
dx = self.im[i+self.w+1]-self.im[i]
dy = self.im[i+self.w]-self.im[i+1]
self.grad[i] = sqrt(dx**2+dy**2)
self.angle[i] = atan2(dy, dx)
def getSum(self) -> float:
return sum(self.grad)
foo = Foo(640, 480)
repeat = 10
t0 = time()
for i in range(repeat):
foo.getGradAngle()
t1 = time()
print(f"foo.getGradAngle(): {(t1-t0)/repeat*1000:6.3f}ms {foo.getSum():12.5f}")
Running:
# Set library path to a Python distribution containing NumPy
export CODON_PYTHON=~/miniconda3/envs/mandel/lib/libpython3.so
# Run
time ./codon run demo.codon
foo.getGradAngle(): 28.023ms 159758.66082
real 0m2.118s
user 0m2.009s
sys 0m0.113s
# Build a release binary (faster)
codon build -release demo.codon
time ./demo
foo.getGradAngle(): 6.639ms 159908.73775
real 0m0.235s
user 0m0.154s
sys 0m0.086s
Finally, I tried the Numba demonstration by @pauljurczak with cache=True.
import numba as nb
import numpy as np
from math import atan2, sqrt
from time import time
@nb.njit(fastmath=True, locals=dict(w=nb.uint32), cache=True)
def getGradAngle(im, grad, angle, w):
for i in range(im.size-w-1):
dx = im[i+w+1]-im[i]
dy = im[i+w]-im[i+1]
grad[i] = sqrt(dx**2+dy**2)
angle[i] = atan2(dy, dx)
w, h = 640, 480
im = np.random.rand(w*h).astype('f4')
grad = np.zeros_like(im)
angle = np.zeros_like(im)
fun = f'getGradAngle(im, grad, angle, w)'
repeat = 10
t0 = time()
for i in range(repeat):
getGradAngle(im, grad, angle, w)
t1 = time()
print(f"{fun}: {(t1-t0)/repeat*1000:6.3f}ms {np.sum(grad):12.5f}")
Running:
rm -fr __pycache__
# First run
time python numba_demo.py
getGradAngle(im, grad, angle, w): 20.154ms 160076.46875
real 0m0.407s
user 0m0.369s
sys 0m0.036s
# 2nd run using jitted cache object
time python numba_demo.py
getGradAngle(im, grad, angle, w): 8.302ms 160129.25000
real 0m0.282s
user 0m0.246s
sys 0m0.035s