Not serializable by pickle or cloudpickle
I need to apply jq in a pyspark distributed environment, thus the 'jq._Program' need to be serialized to be transported to the remote machine. But it encounters the below error:
TypeError: no default __reduce__ due to non-trivial __cinit__
code to reproduce:
import pickle
pickle.dumps(jq.compile('.a'))
Could you pickle the program string instead, rather than the compiled program?
Could you pickle the program string instead, rather than the compiled program?
Sure I could compile the string inside a remote function, but I'm concerned about the computing cost of compiling the string each time the function is called, since it loops over a large dataframe. I profiled it and find it costs 5ms per call. Usually serialization/deserialization are much faster.
I can't say I'm particularly familiar with pickling, but given a jq program contains native data, would an implementation of pickling the program not have to serialise the program string rather than the raw bytes anyway?