jq.py Not serializable by pickle or cloudpickle

I need to apply jq in a pyspark distributed environment, thus the 'jq._Program' need to be serialized to be transported to the remote machine. But it encounters the below error: TypeError: no default __reduce__ due to non-trivial __cinit__

code to reproduce:

import pickle
pickle.dumps(jq.compile('.a'))

Sep 23 '24 10:09 ccaapton

Could you pickle the program string instead, rather than the compiled program?

Sep 23 '24 17:09 mwilliamson

Could you pickle the program string instead, rather than the compiled program?

Sure I could compile the string inside a remote function, but I'm concerned about the computing cost of compiling the string each time the function is called, since it loops over a large dataframe. I profiled it and find it costs 5ms per call. Usually serialization/deserialization are much faster.

Sep 24 '24 00:09 ccaapton

I can't say I'm particularly familiar with pickling, but given a jq program contains native data, would an implementation of pickling the program not have to serialise the program string rather than the raw bytes anyway?

Sep 24 '24 20:09 mwilliamson