gpu support fail running python
Hi, I built codon v0.16.1 (https://github.com/exaloop/codon/releases/tag/v0.16.1) from source tar ball and ran sucsessfully gpu examples using codon. However, when I tried to run a modified code using python I got error: CUDA error at .../codon-0.16.1/codon/runtime/gpu.cpp:102: named symbol not found
Actually I tried 2 modifications, which seem to be reasonable to me. In both cases got the same error. I prefer the first template, as it runs 25-30% faster than the second (using codon). Also I tried to remove external functions and variables from the code, but this don't help.
Code Example1:
import codon
@codon.jit(debug=True)
def run_mandelbrot(pixels):
import gpu
@gpu.kernel
def mandelbrot(pixels):
idx = (gpu.block.x * gpu.block.dim.x) + gpu.thread.x
i, j = divmod(idx, 4096)
a1 = -2.00
b1 = 0.47
a2 = -1.12
b2 = 1.12
s1 = a1 + (j/4096)*(b1 - a1)
s2 = a2 + (i/4096)*(b2 - a2)
c = complex(s1, s2)
z = 0j
iteration = 0
while abs(z) <= 2 and iteration < 1000:
z = z**2 + c
iteration += 1
pixels[idx] = int(255 * iteration/1000)
mandelbrot(pixels, grid=(4096*4096)//1024, block=1024)
return pixels
import numpy as np
import matplotlib.pyplot as plt
from time import time
t0 = time()
pixels = [0 for _ in range(4096 * 4096)]
pixels = run_mandelbrot(pixels)
t1 = time()
print(f'[codon] took {t1 - t0} seconds')
plt.imshow(np.array(pixels).reshape(4096, 4096))
plt.show()
Code Example2:
import codon
@codon.jit(debug=True)
def run_mandelbrot(pixels):
_@par(gpu=True,collapse=2)
for i in range(4096):
for j in range(4096):
a1 = -2.00
b1 = 0.47
a2 = -1.12
b2 = 1.12
s1 = a1 + (j/4096)*(b1 - a1)
s2 = a2 + (i/4096)*(b2 - a2)
c = complex(s1, s2)
z = 0j
iteration = 0
while abs(z) <= 2 and iteration < 1000:
z = z**2 + c
iteration += 1
pixels[i*4096 + j] = int(255 * iteration/1000)
return pixels
import numpy as np
import matplotlib.pyplot as plt
from time import time
t0 = time()
pixels = [0 for _ in range(4096 * 4096)]
pixels = run_mandelbrot(pixels)
t1 = time()
print(f'[codon] took {t1 - t0} seconds')
plt.imshow(np.array(pixels).reshape(4096, 4096))
plt.show()
It would be nice if these code examples could run successfully. Thank you.
Chiming @arshajii who can help with this one.