gopickle icon indicating copy to clipboard operation
gopickle copied to clipboard

how to read a .pickle file which is written by python pandas.to_pickle(). when i use pickle.load occor REDUCE requires a Callable object: &types.GenericClass{Module:"numpy.core.multiarray", Name:"_reconstruct"}

Open zhaoqiji opened this issue 3 years ago • 5 comments

zhaoqiji avatar Apr 19 '21 02:04 zhaoqiji

You need to implement some custom types to handle these additional types. In the example it shows u.FindClass = makePickleFindClass(u.FindClass).

The np.core.multiarray.__reconstruct() class/method has three arguments. See: http://pyopengl.sourceforge.net/pydoc/numpy.core.multiarray.html#-_reconstruct. Argument 1 specifies the return type - in my case it was an np.ndarray.

The great thing is that it is possible to add these classes in your own functions/package while using this code as is. It can be a long journey re-implementing the structure of all the classes on the way though.

(I haven't tried pandas pickles, but seem to have some overlap due to the multiarray/ndarray etc classes)

Konstanty avatar Apr 21 '21 11:04 Konstanty

Hi, sorry for being so late. Thanks @Konstanty for you reply: a custom FindClass is exactly the way to go in this case.

By default, when a custom/unknown class/object/function is encountered during the unpickling process, a GenericClass is used. The fields Module and Name are provided exactly for the purpose of manual inspection.

In extremely simple cases, a GenericClass is enough, but more often the unpickler itself is instructed to instantiate specific classes and/or call specific methods. In this case, you can define a custom FindClass implementation, which intercepts numpy.core.multiarray._reconstruct and returns a custom object you have to define (e.g. a struct). That object should than implement the Callable interface, i.e. should provide the Call method.

For the actual implementation, my advice is to rely on the original Python code, already mentioned in the response above. I suggest to start providing the minimum set of types and functionalities, and then expand them as required, based on the next error provided by the unpickler... and so on and so forth, until all objects and functions are covered.

That was the approach we adopted for the Python-derived types in pytorch package. See for example storage.go.

Were you able to make already some progress? Being numpy so widespread, we might be able to help you directly implementing some types, maybe also including some new stuff in the project.

marco-nicola avatar May 16 '21 15:05 marco-nicola

You need to implement some custom types to handle these additional types. In the example it shows u.FindClass = makePickleFindClass(u.FindClass).

The np.core.multiarray.__reconstruct() class/method has three arguments. See: http://pyopengl.sourceforge.net/pydoc/numpy.core.multiarray.html#-_reconstruct. Argument 1 specifies the return type - in my case it was an np.ndarray.

The great thing is that it is possible to add these classes in your own functions/package while using this code as is. It can be a long journey re-implementing the structure of all the classes on the way though.

(I haven't tried pandas pickles, but seem to have some overlap due to the multiarray/ndarray etc classes)

Hi, sorry for being so late. Thanks @Konstanty for you reply: a custom FindClass is exactly the way to go in this case.

By default, when a custom/unknown class/object/function is encountered during the unpickling process, a GenericClass is used. The fields Module and Name are provided exactly for the purpose of manual inspection.

In extremely simple cases, a GenericClass is enough, but more often the unpickler itself is instructed to instantiate specific classes and/or call specific methods. In this case, you can define a custom FindClass implementation, which intercepts numpy.core.multiarray._reconstruct and returns a custom object you have to define (e.g. a struct). That object should than implement the Callable interface, i.e. should provide the Call method.

For the actual implementation, my advice is to rely on the original Python code, already mentioned in the response above. I suggest to start providing the minimum set of types and functionalities, and then expand them as required, based on the next error provided by the unpickler... and so on and so forth, until all objects and functions are covered.

That was the approach we adopted for the Python-derived types in pytorch package. See for example storage.go.

Were you able to make already some progress? Being numpy so widespread, we might be able to help you directly implementing some types, maybe also including some new stuff in the project.

Thank you for your detailed answers,i will try it. but i did not have any progress right now. Findally i use python write it as a csv file then read the csv file instead. I'd be excited if you directly implementing some types.

zhaoqiji avatar May 19 '21 07:05 zhaoqiji

You need to implement some custom types to handle these additional types. In the example it shows u.FindClass = makePickleFindClass(u.FindClass).

The np.core.multiarray.__reconstruct() class/method has three arguments. See: http://pyopengl.sourceforge.net/pydoc/numpy.core.multiarray.html#-_reconstruct. Argument 1 specifies the return type - in my case it was an np.ndarray.

The great thing is that it is possible to add these classes in your own functions/package while using this code as is. It can be a long journey re-implementing the structure of all the classes on the way though.

(I haven't tried pandas pickles, but seem to have some overlap due to the multiarray/ndarray etc classes)

thank you for your reply. i'm sorry for being here so late Findally i use python write it as a csv file then read the csv file instead

zhaoqiji avatar May 19 '21 07:05 zhaoqiji

met the same problem, is there any new feature to support numpy pickle?

iammeizu avatar May 12 '22 03:05 iammeizu

You need to implement some custom types to handle these additional types. In the example it shows u.FindClass = makePickleFindClass(u.FindClass). The np.core.multiarray.__reconstruct() class/method has three arguments. See: http://pyopengl.sourceforge.net/pydoc/numpy.core.multiarray.html#-_reconstruct. Argument 1 specifies the return type - in my case it was an np.ndarray. The great thing is that it is possible to add these classes in your own functions/package while using this code as is. It can be a long journey re-implementing the structure of all the classes on the way though. (I haven't tried pandas pickles, but seem to have some overlap due to the multiarray/ndarray etc classes)

Hi, sorry for being so late. Thanks @Konstanty for you reply: a custom FindClass is exactly the way to go in this case. By default, when a custom/unknown class/object/function is encountered during the unpickling process, a GenericClass is used. The fields Module and Name are provided exactly for the purpose of manual inspection. In extremely simple cases, a GenericClass is enough, but more often the unpickler itself is instructed to instantiate specific classes and/or call specific methods. In this case, you can define a custom FindClass implementation, which intercepts numpy.core.multiarray._reconstruct and returns a custom object you have to define (e.g. a struct). That object should than implement the Callable interface, i.e. should provide the Call method. For the actual implementation, my advice is to rely on the original Python code, already mentioned in the response above. I suggest to start providing the minimum set of types and functionalities, and then expand them as required, based on the next error provided by the unpickler... and so on and so forth, until all objects and functions are covered. That was the approach we adopted for the Python-derived types in pytorch package. See for example storage.go. Were you able to make already some progress? Being numpy so widespread, we might be able to help you directly implementing some types, maybe also including some new stuff in the project.

Thank you for your detailed answers,i will try it. but i did not have any progress right now. Findally i use python write it as a csv file then read the csv file instead. I'd be excited if you directly implementing some types.

Meet the same problem with you,so this another solution is pickle file transform csv file to read?

OctopusLian avatar Nov 24 '22 01:11 OctopusLian

Will a solution to this problem be provided?

package main

import (
	"github.com/nlpodyssey/gopickle/pytorch"
)

func main() {
	if _, err := pytorch.Load("model.pt"); err != nil {
		panic(err.Error())
	}
}

panic: class not found: numpy.core.multiarray _reconstruct

aiwaki avatar Apr 04 '23 18:04 aiwaki

FYI, I've implemented something along these lines. https://github.com/sbinet/npyio/pull/22 seems to be able to read numpy.ndarrays that have been pickled.

at least, these kinds of arrays:

import numpy as np
arr = np.array([[1],[2,"3"],[4,5,6]], dtype="object")
import pickle
pickle.dump(arr, open("foo.pkl", "bw"))

on the Go side, github.com/sbinet/npyio/npy exports func ClassLoader(module, name string) (any, error) that registers the needed bits to read back npy.Array and npy.ArrayDescr.

HTH, -s

sbinet avatar Nov 24 '23 17:11 sbinet

if nobody shouts out, I'll consider this as fixed (in sbinet/npyio) and close that issue by the end of the week.

sbinet avatar Nov 29 '23 09:11 sbinet

fixed by https://github.com/sbinet/npyio/pull/22

sbinet avatar Dec 05 '23 09:12 sbinet