PyCall.jl
PyCall.jl copied to clipboard
pyjlwrap support for Pickle Serialization
PyCall works nicely for many use cases between Python and Julia. In particular, there is one that could be improved and very important for Data Scientist community. For example, I tried to use it for PySpark library and works very well for the basic use case. But, if the user needs to create a UDF (User Defined Functions), the user will have trouble to serialize the functions. The UDFs, in this case, would help to many DSs reuse Julia code and call spark to do the heavy work. Have this enabled, would improve the usage of Julia in different scenarios.
To solve the current issues with UDF, PyObject needs to be serializable with Pickle. I don't have much idea how to solve this, but I have a simple use case that if we fix would improve towards this functionality:
Example:
using PyCall
pickle = pyimport("pickle")
pickle.dumps(x -> x + 1)
Error:
ERROR: PyError ($(Expr(:escape, :(ccall(#= /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:43 =# @pysym(:PyObject_Call), PyPtr, (PyPtr, PyPtr, PyPtr), o, pyargsptr, kw))))) <class 'TypeError'>
TypeError("cannot pickle 'PyCall.jlwrap' object")
Stacktrace:
[1] pyerr_check at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:60 [inlined]
[2] pyerr_check at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:64 [inlined]
[3] _handle_error(::String) at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:81
[4] macro expansion at /root/.julia/packages/PyCall/zqDXB/src/exception.jl:95 [inlined]
[5] #110 at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:43 [inlined]
[6] disable_sigint at ./c.jl:446 [inlined]
[7] __pycall! at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:42 [inlined]
[8] _pycall!(::PyObject, ::PyObject, ::Tuple{var"#3#4"}, ::Int64, ::Ptr{Nothing}) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:29
[9] _pycall!(::PyObject, ::PyObject, ::Tuple{var"#3#4"}, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:11
[10] (::PyObject)(::Function; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:86
[11] (::PyObject)(::Function) at /root/.julia/packages/PyCall/zqDXB/src/pyfncall.jl:86
[12] top-level scope at REPL[13]:1
Reference to UDF in Python: https://docs.databricks.com/spark/latest/spark-sql/udf-python.html
In other words, you want to serialize Julia objects (wrapped in Python objects) via Pickle.
I guess we could do this by embedding the Julia serialization format (via the Serialization stdlib) in pickle?
Exactly @stevengj . How we can accomplish this? Could you provide some guidance, please?
I think it involves overloading __getstate__
and __setstate__
(https://docs.python.org/3/library/pickle.html#object.getstate), but I would have to do a bit of reading on pickle and how it interacts with the C api.
Or rather, we probably want the lower-level __reduce__
interface (https://docs.python.org/3/library/pickle.html#object.reduce), which is more error-prone but will give us more control.
Great, @stevengj if we can overcome this, would be a huge step for the Julia community and would be glad to publish an article showing this new awesome feature!