jq.py icon indicating copy to clipboard operation
jq.py copied to clipboard

Not serializable by pickle or cloudpickle

Open ccaapton opened this issue 1 year ago • 3 comments

I need to apply jq in a pyspark distributed environment, thus the 'jq._Program' need to be serialized to be transported to the remote machine. But it encounters the below error: TypeError: no default __reduce__ due to non-trivial __cinit__

code to reproduce:

import pickle
pickle.dumps(jq.compile('.a'))

ccaapton avatar Sep 23 '24 10:09 ccaapton

Could you pickle the program string instead, rather than the compiled program?

mwilliamson avatar Sep 23 '24 17:09 mwilliamson

Could you pickle the program string instead, rather than the compiled program?

Sure I could compile the string inside a remote function, but I'm concerned about the computing cost of compiling the string each time the function is called, since it loops over a large dataframe. I profiled it and find it costs 5ms per call. Usually serialization/deserialization are much faster.

ccaapton avatar Sep 24 '24 00:09 ccaapton

I can't say I'm particularly familiar with pickling, but given a jq program contains native data, would an implementation of pickling the program not have to serialise the program string rather than the raw bytes anyway?

mwilliamson avatar Sep 24 '24 20:09 mwilliamson