Pickle.jl
Pickle.jl copied to clipboard
Pandas support
I'm trying to load a pickle file using Pickle.jl
using load
function:
julia> bud = load(open("MV-Budget.pkl"))
But it leads to an error:
ERROR: AssertionError: Imcompatible protocol version:
Trying to load version 5 pickle file with version 4 pickler.
Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.
If that still failed, please open an issue.
Stacktrace:
[1] execute!(p::Pickler{4}, #unused#::Val{Pickle.OpCodes.PROTO}, arg::UInt8)
@ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:178
[2] run!(p::Pickler{4}, op::Pickle.OpCodes.OpCode, io::IOStream)
@ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:25
[3] load(p::Pickler{4}, io::IOStream)
@ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:15
[4] load(io::IOStream; proto::Int64)
@ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:10
[5] load(io::IOStream)
@ Pickle C:\Users\Shayan\.julia\packages\Pickle\pwvBM\src\deserializer.jl:10
[6] top-level scope
@ REPL[6]:1
Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.
What is the error log of load("MV-Budget.pkl"; proto = 5)
?
Try setting the `proto` keyword argument when loading, e.g. `load(file; proto = 5)`.
What is the error log of
load("MV-Budget.pkl"; proto = 5)
?
That will lead to this:
Defer(:build, Defer(:newobj, Defer(:pandas.core.frame.DataFrame)), Dict{Any, Any}("_mgr" => Defer(:reduce, Defer(:pandas.core.internals.managers.BlockManager), (Defer(:reduce, Defer(:pandas._libs.internals._unpickle_block), Defer(:reduce, Defer(:numpy.core.numeric._frombuffer), UInt8[0x02, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x04, 0x00 … 0x00, 0x00, 0x07, 0x00, 0x00, 0x00, 0x07, 0x00, 0x00, 0x00], Defer(:build, Defer(:reduce, Defer(:numpy.dtype), i4, false, true), (3, "<", nothing, nothing, nothing, -1, -1, 0)), (17, 24), F), Defer(:reduce, Defer(:builtins.slice), 0, 17, 1), 2),), Any[Defer(:reduce, Defer(:pandas.core.indexes.base._new_Index), Defer(:pandas.core.indexes.base.Index), Dict{Any, Any}("name" => nothing, "data" => Defer(:build, Defer(:reduce, Defer(:numpy.core.multiarray._reconstruct), Defer(:numpy.ndarray), (0,), UInt8[0x62]), (1, (17,), Defer(:build, Defer(:reduce, Defer(:numpy.dtype), O8, false, true), (3, "|", nothing, nothing, nothing, -1, -1, 63)), false, Any["return5", "return10", "return25", "std5", "std10", "std25", "corr5", "corr10", "corr25", "diff_close5", "diff_close10", "diff_close25", "pred_ret5", "pred_vol5", "pred_cor5", "pred_cor10", "pred_cor25"])))), Defer(:reduce, Defer(:pandas.core.indexes.base._new_Index), Defer(:pandas.core.indexes.base.Index), Dict{Any, Any}("name" => nothing, "data" => Defer(:build, Defer(:reduce, Defer(:numpy.core.multiarray._reconstruct), Defer(:numpy.ndarray), (0,), UInt8[0x62]), (1, (24,), Defer(:build, Defer(:reduce, Defer(:numpy.dtype), O8, false, true), (3, "|", nothing, nothing, nothing, -1, -1, 63)), false, Any["MSFT", "PEP", "TSLA", "AMZN", "LKQ", "ABMD", "MSI", "PH", "NKE", "TM" … "EQIX", "EA", "AAP", "TEL", "DG", "EXR", "MDLZ", "FIS", "CRL", "RCL"]))))]), "_metadata" => Any[], "attrs" => Dict{Any, Any}(), "_typ" => "dataframe", "_flags" => Dict{Any, Any}("allows_duplicate_labels" => true)))
But, I get this output if I use Pandas.jl for reading the pickle file:
julia> df = read_pickle("MV-Budget.pkl")
return5 return10 return25 std5 std10 ... pred_ret5 pred_vol5 pred_cor5 pred_cor10 pred_cor25
MSFT 2 1 4 4 4 ... 5 7 3 3 2
PEP 8 8 7 7 9 ... 4 10 4 4 4
TSLA 1 5 6 2 2 ... 1 2 9 9 9
AMZN 4 4 5 1 1 ... 4 5 5 6 7
LKQ 4 4 3 10 10 ... 7 3 10 10 10
ABMD 7 6 1 5 5 ... 6 1 7 8 8
MSI 6 6 8 6 7 ... 1 9 5 5 5
PH 2 3 4 3 3 ... 10 5 1 1 1
NKE 3 2 3 7 6 ... 10 4 2 2 3
TM 10 7 6 9 8 ... 7 10 8 7 7
EOG 1 1 1 6 6 ... 10 6 10 10 10
GOOGL 3 3 4 1 1 ... 8 7 1 3 4
NFLX 10 10 10 4 4 ... 5 1 4 2 3
GS 4 2 2 3 3 ... 2 4 2 1 1
EQIX 7 9 9 1 1 ... 1 7 6 6 5
EA 9 8 8 10 10 ... 8 3 6 5 6
AAP 9 10 9 9 8 ... 9 2 10 10 10
TEL 5 4 2 8 7 ... 9 10 1 1 1
DG 10 10 10 7 9 ... 2 1 4 4 4
EXR 8 5 5 8 7 ... 6 8 9 9 8
MDLZ 7 7 7 10 10 ... 7 9 7 8 9
FIS 5 9 7 4 5 ... 4 8 3 4 2
CRL 6 7 10 5 4 ... 3 6 7 7 6
RCL 1 1 1 2 2 ... 3 4 8 7 7
[24 rows x 17 columns]
That will lead to this:
That is a result indicating that there are some stuff unknown to Pickle.jl. Pandas.jl don't have issues because they call python directly. In order to make that work, we need to add the corresponding method mapping for each method you seen in the Defer
object.
see #25
So... is there any way to transform from Defer
to DataFrame
? Or to build the DataFrame
from the Defer
obj?
build the DataFrame from the Defer obj?
It's definitely doable. This is how we support a new python object with Pickle.jl, but we need someone to actually implement that.