gokart icon indicating copy to clipboard operation
gokart copied to clipboard

Use `dill` instead of `pickle` for processing `.pkl` files

Open maronuu opened this issue 11 months ago • 2 comments

Introduce dill library as a serializer instead of pickle for all .pkl files.

gokart has its own file processors for various file formats. For .pkl files, we have used standard pickle library. However, it cannot handle a class or function whose metadata is dynamically determined when initialization.

For example, the following code is a class that update its own method run when initialization by using wrapper plus1. pickle library cannot handle such cases. Thus we introduce dill, which is built on pickle and can handle more various objects.

def plus1(func: Callable[[], int]) -> Callable[[], int]:
    @functools.wraps(func)
    def wrapped() -> int:
        ret = func()
        return ret + 1
    
    return wrapped

class A:
    run: Callable[[], int]
    
    def __init__(self) -> None:
        self.run = plus1(self.run)
    
    def run(self) -> int:
        return 1

cloudpickle is also another potential candidate, but in terms of longer history and more users, we adopt dill. Note that objects that can be serialized by pickle are also serialized by dill (https://dill.readthedocs.io/en/latest/#basic-usage ).

Compatibility

dill is a drop-in replacement for pickle. Existing code can be updated to allow complete pickling using:

As mentioned in doc, objects that can be serialized by pickle are serialized by dill. Additionally, we confirm the objects dumped by pickle are loaded via dill.load.

For the storage size, we confirm that the sizes of objects serialized by pickle or dill are the same.

maronuu avatar Feb 29 '24 05:02 maronuu

@maronuu Thank you for the suggestion!

I have some questions.

  • How much the storage usage will increase?
  • Can dill load the object dumped by pickle library? I think it is very important for compatibility

kitagry avatar Feb 29 '24 08:02 kitagry

@kitagry Thank you for the comment! I added some notes about the storage usage and compatibility in the PR description.

maronuu avatar Mar 12 '24 03:03 maronuu

Thank you!

kitagry avatar Apr 23 '24 06:04 kitagry