pyo3
pyo3 copied to clipboard
Pickle Support
As of right now its not possible to pickle classes created by PyO3.
This feature would be invaluable for situations where some form of persistence would be desireable.
As of right now it has trouble pickling after I call
#[new]
fn __new__(obj: &PyRawObject) -> PyResult<()>{
Otherwise the .__dict__
attributes are maintained prior to initialization with __new__
Would this mean implementing __getstate__
and __setstate__
methods (cf https://docs.python.org/3/library/pickle.html#pickling-class-instances)?
For instance, the way pickling works for,
might provide some examples.
For instance, if we take the documentation example for MyClass
,
# use pyo3::prelude::*;
# use pyo3::PyRawObject;
#[pyclass]
struct MyClass {
num: i32,
}
#[pymethods]
impl MyClass {
#[new]
fn new(obj: &PyRawObject, num: i32) {
obj.init({
MyClass {
num,
}
});
}
}
by default we get the following error when pickling this class,
obj = MyClass()
> pickle.dumps(obj)
E TypeError: can't pickle MyClass objects
If we now add the __getstate__
/ __setstate__
methods,
fn __getstate__(&self) -> PyResult<(i32)> {
Ok(self.num)
}
fn __setstate__(&mut self, state: i32) -> PyResult<()> {
self.num = state;
Ok(())
}
we get another exception,
_pickle.PicklingError: Can't pickle <class 'MyClass'>: attribute lookup MyClass on builtins failed
There is some additional step I must be missing here.
@rth : this may be related to the fact that PyO3 exposes all classes as part of the builtins
module, because the import mechanism has not been properly implemented, so pickle tries to use builtins.MyClass
and fails with the error you reported.
Thanks @althonos ! Opened a separate issue about it in #474
So by subclassing , to set the __module__
correctly as suggested in https://github.com/PyO3/pyo3/issues/474#issuecomment-489521285, pickling seems to work.
Though, I get a segfault occasionally (i.e. it does seem to be random) at exit. For instance when running a pytest session where one test checks pickling,
gdb --args python3.7 -m pytest -k test_pickle
GNU gdb (GDB) CentOS (7.0.1-45.el5.centos)
[...]
Reading symbols from /opt/_internal/cpython-3.7.1/bin/python3.7...(no debugging symbols found)...done.
(gdb) run
Starting program: /opt/_internal/cpython-3.7.1/bin/python3.7 -m pytest -k test_pickle
warning: Error disabling address space randomization: Operation not permitted
============================================================= test session starts =============================================================
platform linux -- Python 3.7.1, pytest-4.4.1, py-1.8.0, pluggy-0.9.0 -- /opt/_internal/cpython-3.7.1/bin/python3.7
cachedir: .pytest_cache
rootdir: /src/python
collected 1 items / 1 selected
my_module/test_pickle.py::test_pickle PASSED
=================================================== 1 passed in 0.13 seconds ===================================================
During startup program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt
No stack.
and there is no backtrace. Will try to investigate it later.
The segfault likely occurs because subclassing is broken
How about trying dill? Pickle can't handle lots of pure python serialisation cases. https://pypi.org/project/dill/
Not sure if it's interesting; this snippet just got shared on gitter. https://gist.github.com/ethanhs/fd4123487974c91c7e5960acc9aa2a77
I've got a simple struct that I need to deepcopy. I'm trying to figure out how to pickle my struct (after getting the TypeError: cannot pickle error). The gist above shows how to do it for a single member, but I'm too much of a newb to see how to do this with multiple members.
I tried
pub fn __getstate__(&self, py: Python) -> PyResult<PyObject> {
Ok(PyBytes::new(py, &serialize(&self.foo).unwrap()).to_object(py))
Ok(PyBytes::new(py, &serialize(&self.bar).unwrap()).to_object(py))
}
..but get an error "expected one of .
, ;
, ?
, }
, or an operator" after the first OK.
@shaolo1 I would just return the tuple of members:
pub fn __getstate__(&self, py: Python) -> PyObject {
(
PyBytes::new(py, &serialize(&self.foo)?),
PyBytes::new(py, &serialize(&self.bar)?),
).to_object(py)
}
@davidhewitt Thanks. I'll try that if I encounter it again. I got around the problem by just implementing deepcopy in the parent object and handling the copy there so that pickle support was not needed in my rust object.
I was able to enable pickling by writing the __getstate__
, __setstate__
, and __getnewargs__
magic methods in pymethods for a pure Rust project using bincode::{deserialize, serialize}
. In __getnewargs__
you need to return a tuple of all the arguments __new__
will use on deserializaton, otherwise you'll see something like TypeError: MyStruct.__new__() missing 2 required positional arguments: 'my_first_arg' and 'my_second_arg'
.
Here is a generic example:
pub fn __setstate__(&mut self, state: Vec<u8>) -> PyResult<()> {
*self = deserialize(&state).unwrap();
Ok(())
}
pub fn __getstate__(&self) -> PyResult<Vec<u8>> {
Ok(serialize(&self).unwrap())
}
pub fn __getnewargs__(&self) -> PyResult<(f64, f64)> {
Ok((self.my_first_arg, self.my_second_arg))
}
Also, here is a code example for the workaround @shaolo1 mentioned. Cloning for deepcopy may be faster than serializing & deserializing (which I guess is how Python deepcopies normally?), but I haven't tested that.
pub fn copy(&self) -> Self {self.clone()}
pub fn __copy__(&self) -> Self {self.clone()}
pub fn __deepcopy__(&self, _memo: &PyDict) -> Self {self.clone()}
That'll allow you to return a clone using copy.copy()
, copy.deepcopy()
, or by calling the .copy()
method.
Edits:
- Also important to note I needed to change
#[pyclass]
to#[pyclass(module = "mymodulename")]
- It seems like
bincode
is performing rather slow, I'm trying to figure out how to useserde_bytes
to speed things up. Maybe in conjuction with PyBytes? Though I want to avoid the GIL wherever I possibly can.
Yes, Vec<u8>
will cast each byte in turn into a Python list. I think you do need to use PyBytes
here, and it's irrelevant that you want to avoid the GIL because these are Python methods you're implementing.
I think you want something like this:
pub fn __setstate__(&mut self, state: &PyBytes) -> PyResult<()> {
*self = deserialize(state.as_bytes()).unwrap();
Ok(())
}
pub fn __getstate__<'py>(&self, py: Python<'py>) -> PyResult<&'py PyBytes> {
Ok(PyBytes::new(py, serialize(&self).unwrap()))
}
pub fn __getnewargs__(&self) -> PyResult<(f64, f64)> {
Ok((self.my_first_arg, self.my_second_arg))
}
I would also strongly recommend you replace .unwrap()
with conversion to actual PyResult
errors :)
Woah, yeah that sped up my round trip serializing and deserializing benchmark by 100x. And thanks for the tip about PyResult errors. I did have to modify __getstate__
ever so slightly to add a reference:
pub fn __getstate__<'py>(&self, py: Python<'py>) -> PyResult<&'py PyBytes> {
Ok(PyBytes::new(py, &serialize(&self).unwrap()))
}
I also did some benchmarking with my structs regarding the performance of cloning vs. roundtrip pickling and bincode serde, which might be useful to someone:
- Having a
__deepcopy__
pymethod that calls.clone()
is by far the fastest way I've found of copying a pyo3 object. My benchmark took 1.38 usec - The next best thing is having bincode serde methods, which roundtrip took 15.6 usec (before the PyBytes change it took 1.28 msec)
pub fn to_bincode<'py>(&self, py: Python<'py>) -> PyResult<&'py PyBytes> { Ok(PyBytes::new(py, &serialize(&self).unwrap())) } #[classmethod] pub fn from_bincode(_cls: &PyType, encoded: &PyBytes) -> PyResult<Self> { Ok(deserialize(encoded.as_bytes()).unwrap()) }
- The least performant is pickling, as expected. I guess Python has a lot more overhead here. It took 439 usec roundtrip.
~since __setstate__
requires a mutable reference is there a possibility to have a pickle
support for a #[pyclass(frozen)]
class?~
never mind, I've switched to __reduce__
method
https://github.com/lycantropos/rithm/blob/765d1990800d47e169f84912b16a9857c0575fff/src/lib.rs#L441-L449
You can also use __getnewargs__
or __getnewargs_ex__
, which is the simplest option if you can pass all your state directly back to #[new]
when unpickling (I would guess this is true for most frozen
classes).