Pickle.jl icon indicating copy to clipboard operation
Pickle.jl copied to clipboard

Pandas support

Open shayandavoodii opened this issue 2 years ago • 6 comments

I'm trying to load a pickle file using Pickle.jl using load function:

julia> bud = load(open("MV-Budget.pkl"))

But it leads to an error:

ERROR: AssertionError: Imcompatible protocol version:
    Trying to load version 5 pickle file with version 4 pickler.
    Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.
    If that still failed, please open an issue.

Stacktrace:
 [1] execute!(p::Pickler{4}, #unused#::Val{Pickle.OpCodes.PROTO}, arg::UInt8)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:178
 [2] run!(p::Pickler{4}, op::Pickle.OpCodes.OpCode, io::IOStream)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:25
 [3] load(p::Pickler{4}, io::IOStream)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:15
 [4] load(io::IOStream; proto::Int64)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:10
 [5] load(io::IOStream)
   @ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:10
 [6] top-level scope
   @ REPL[6]:1

shayandavoodii avatar Aug 30 '22 08:08 shayandavoodii

  Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.

What is the error log of load("MV-Budget.pkl"; proto = 5)?

chengchingwen avatar Aug 30 '22 17:08 chengchingwen

  Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.

What is the error log of load("MV-Budget.pkl"; proto = 5)?

That will lead to this:

Defer(:build, Defer(:newobj, Defer(:pandas.core.frame.DataFrame)), Dict{Any, Any}("_mgr" => Defer(:reduce, Defer(:pandas.core.internals.managers.BlockManager), (Defer(:reduce, Defer(:pandas._libs.internals._unpickle_block), Defer(:reduce, Defer(:numpy.core.numeric._frombuffer), UInt8[0x02, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x04, 0x00  …  0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00], Defer(:build, Defer(:reduce, Defer(:numpy.dtype), i4, false, true), (3, "<", nothing, nothing, nothing, -1, -1, 0)), (17, 24), F), Defer(:reduce, Defer(:builtins.slice), 0, 17, 1), 2),), Any[Defer(:reduce, Defer(:pandas.core.indexes.base._new_Index), Defer(:pandas.core.indexes.base.Index), Dict{Any, Any}("name" => nothing, "data" => Defer(:build, Defer(:reduce, Defer(:numpy.core.multiarray._reconstruct), Defer(:numpy.ndarray), (0,), UInt8[0x62]), (1, (17,), Defer(:build, Defer(:reduce, Defer(:numpy.dtype), O8, false, true), (3, "|", nothing, nothing, nothing, -1, -1, 63)), false, Any["return5", "return10", "return25", "std5", "std10", "std25", "corr5", "corr10", "corr25", "diff_close5", "diff_close10", "diff_close25", "pred_ret5", "pred_vol5", "pred_cor5", "pred_cor10", "pred_cor25"])))), Defer(:reduce, Defer(:pandas.core.indexes.base._new_Index), Defer(:pandas.core.indexes.base.Index), Dict{Any, Any}("name" => nothing, "data" => Defer(:build, Defer(:reduce, Defer(:numpy.core.multiarray._reconstruct), Defer(:numpy.ndarray), (0,), UInt8[0x62]), (1, (24,), Defer(:build, Defer(:reduce, Defer(:numpy.dtype), O8, false, true), (3, "|", nothing, nothing, nothing, -1, -1, 63)), false, Any["MSFT", "PEP", "TSLA", "AMZN", "LKQ", "ABMD", "MSI", "PH", "NKE", "TM"  …  "EQIX", "EA", "AAP", "TEL", "DG", "EXR", "MDLZ", "FIS", "CRL", "RCL"]))))]), "_metadata" => Any[], "attrs" => Dict{Any, Any}(), "_typ" => "dataframe", "_flags" => Dict{Any, Any}("allows_duplicate_labels" => true)))

But, I get this output if I use Pandas.jl for reading the pickle file:

julia> df = read_pickle("MV-Budget.pkl")
       return5  return10  return25  std5  std10  ...  pred_ret5  pred_vol5  pred_cor5  pred_cor10  pred_cor25
MSFT         2         1         4     4      4  ...          5          7          3           3           2
PEP          8         8         7     7      9  ...          4         10          4           4           4
TSLA         1         5         6     2      2  ...          1          2          9           9           9
AMZN         4         4         5     1      1  ...          4          5          5           6           7
LKQ          4         4         3    10     10  ...          7          3         10          10          10
ABMD         7         6         1     5      5  ...          6          1          7           8           8
MSI          6         6         8     6      7  ...          1          9          5           5           5
PH           2         3         4     3      3  ...         10          5          1           1           1
NKE          3         2         3     7      6  ...         10          4          2           2           3
TM          10         7         6     9      8  ...          7         10          8           7           7
EOG          1         1         1     6      6  ...         10          6         10          10          10
GOOGL        3         3         4     1      1  ...          8          7          1           3           4
NFLX        10        10        10     4      4  ...          5          1          4           2           3
GS           4         2         2     3      3  ...          2          4          2           1           1
EQIX         7         9         9     1      1  ...          1          7          6           6           5
EA           9         8         8    10     10  ...          8          3          6           5           6
AAP          9        10         9     9      8  ...          9          2         10          10          10
TEL          5         4         2     8      7  ...          9         10          1           1           1
DG          10        10        10     7      9  ...          2          1          4           4           4
EXR          8         5         5     8      7  ...          6          8          9           9           8
MDLZ         7         7         7    10     10  ...          7          9          7           8           9
FIS          5         9         7     4      5  ...          4          8          3           4           2
CRL          6         7        10     5      4  ...          3          6          7           7           6
RCL          1         1         1     2      2  ...          3          4          8           7           7

[24 rows x 17 columns]

shayandavoodii avatar Aug 30 '22 19:08 shayandavoodii

That will lead to this:

That is a result indicating that there are some stuff unknown to Pickle.jl. Pandas.jl don't have issues because they call python directly. In order to make that work, we need to add the corresponding method mapping for each method you seen in the Defer object.

chengchingwen avatar Aug 30 '22 22:08 chengchingwen

see #25

zsz00 avatar Sep 22 '22 11:09 zsz00

So... is there any way to transform from Defer to DataFrame? Or to build the DataFrame from the Defer obj?

DarioSlaifsteinSk avatar Jun 11 '24 13:06 DarioSlaifsteinSk

build the DataFrame from the Defer obj?

It's definitely doable. This is how we support a new python object with Pickle.jl, but we need someone to actually implement that.

chengchingwen avatar Jun 11 '24 14:06 chengchingwen