gokart
gokart copied to clipboard
Use `dill` instead of `pickle` for processing `.pkl` files
Introduce dill
library as a serializer instead of pickle
for all .pkl
files.
gokart has its own file processors for various file formats. For .pkl
files, we have used standard pickle
library. However, it cannot handle a class or function whose metadata is dynamically determined when initialization.
For example, the following code is a class that update its own method run
when initialization by using wrapper plus1
. pickle
library cannot handle such cases. Thus we introduce dill
, which is built on pickle
and can handle more various objects.
def plus1(func: Callable[[], int]) -> Callable[[], int]:
@functools.wraps(func)
def wrapped() -> int:
ret = func()
return ret + 1
return wrapped
class A:
run: Callable[[], int]
def __init__(self) -> None:
self.run = plus1(self.run)
def run(self) -> int:
return 1
cloudpickle
is also another potential candidate, but in terms of longer history and more users, we adopt dill
. Note that objects that can be serialized by pickle
are also serialized by dill
(https://dill.readthedocs.io/en/latest/#basic-usage ).
Compatibility
dill is a drop-in replacement for pickle. Existing code can be updated to allow complete pickling using:
As mentioned in doc, objects that can be serialized by pickle
are serialized by dill
. Additionally, we confirm the objects dumped by pickle
are loaded via dill.load
.
For the storage size, we confirm that the sizes of objects serialized by pickle
or dill
are the same.
@maronuu Thank you for the suggestion!
I have some questions.
- How much the storage usage will increase?
- Can
dill
load the object dumped bypickle
library? I think it is very important for compatibility
@kitagry Thank you for the comment! I added some notes about the storage usage and compatibility in the PR description.
Thank you!