how to optimize reading long tuples from julia to python
Is your feature request related to a problem? Please describe.
This code takes around 7s on a modern i9 the first time that it's executed. It's immediate the next times it's executed. It's also immediate to execute (zeros(20_000)...,) in julia.
I have python 3.12 and julia 1.10.5
import juliacall
v=juliacall.Main.seval("(zeros(20_000)...,)");
Describe the solution you'd like Is there any way to speed up the initial compilation?
Describe alternatives you've considered I tried to use this code to to speed up reading data from julia to python by going through a tuple. Looping a long julia vector from python is much slower than looping a tuple ( around 4 times slower to read a 100_000 floats vector vs reading a tuple)
I can't explain the speed difference (seval is directly calling Julia to execute that code, there is essentially zero overhead from JuliaCall) but you shouldn't be creating tuples of thousands of elements in Julia anyway. What are you actually trying to do?
What are you actually trying to do?
I tried to use this code to to speed up reading long vectors from julia to python by going. Looping a long julia vector from python is much slower than looping a tuple ( around 4 times slower to read a 100_000 floats vector vs reading a tuple). So I tried converting the vector to a tuple in julia before reading it. Indeed it's much faster, but the compilation overhead is huge. An alternative I found is, instead of the reading the long julia vector from python, I pass a python vector to julia and julia populates it. This results in being around x3 faster also.
Can you give some code to demonstrate what you're doing, i.e. some code that works but is slow that you'd like to be faster?
get_v what I'd like to optimize (reading a long julia vector).
set_v is the workaround that, by passing a python array to julia, the reading of the vector becomes 3 times faster.
from juliacall import Main as jl
import timeit
import logging
jl.seval("get_v(v,n)=rand(n)")
jl.seval("""function setPyVector(py_v, jl_v)
@inbounds for (i, v) in enumerate(jl_v)
py_v[i]=v;
end;
end;""")
jl.seval("function set_v(v,n) setPyVector(v, get_v(v,n)); return v end")
def benchmark(f, vector_len, retries) -> float:
def workload():
sum = 0
v = vector_len * [None]
fn = f(v, vector_len)
for i in fn:
sum += i
workload() # warmup
print("running ", f)
print("elapsed ", timeit.timeit(workload, number=retries))
benchmark(jl.get_v, vector_len=100000, retries=100)
benchmark(jl.set_v, vector_len=100000, retries=100)
This is what I get on a modern i9:
running get_v
elapsed 6.038048821996199
running set_v
elapsed 1.921932346012909
The slow part won't be the call to get_v but the summation you perform afterwards. This is because get_v is returning a lazy wrapper around the rand(n) array, so when you do the sum, each element is materialised lazily one by one by calling into Julia. In your case it will be faster to materialise the whole array as a Python list in one go, such as by defining get_v(v, n) = pylist(rand(n)).