Feather.jl icon indicating copy to clipboard operation
Feather.jl copied to clipboard

Receive "ArgumentError: Data is not in feather format" when reading dataframe written from Python

Open def-mycroft opened this issue 5 years ago • 3 comments

Hello, apologies in advance if I'm missing something simple here.

I want to write a dataframe to feather using Python and then load it into Julia. When I attempt to do this I receive an error ArgumentError: Data is not in feather format.

So, to provide a reproducible example, when I write out a dataframe in Python like this:

import pandas as pd
import feather
df = pd.read_json('{"open":{"0":443.9,"1":443.9,"2":443.97,"3":443.5,"4":443.8},"high":{"0":443.9,"1":443.9,"2":443.97,"3":443.5,"4":443.98},"low":{"0":443.9,"1":443.9,"2":443.6,"3":443.5,"4":443.8},"close":{"0":443.9,"1":443.9,"2":443.6,"3":443.5,"4":443.98},"volume":{"0":436,"1":264,"2":1122,"3":202,"4":3202}}')
feather.write_dataframe(df, 'from-py.feather')

...and then try to load it into Julia:

using Feather
df = Feather.read("from-py.feather")

...I receive:

ERROR: ArgumentError: Data is not in feather format: header = UInt8[0x41, 0x52, 0x52, 0x4f], footer = UInt8[0x52, 0x4f, 0x57, 0x31].
Stacktrace:
 [1] validatedata(::Array{UInt8,1}) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/loaddata.jl:11
 [2] #loaddata#6 at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/loaddata.jl:17 [inlined]
 [3] #loaddata at ./none:0 [inlined]
 [4] #Source#7(::Bool, ::Type, ::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:17
 [5] Type at ./none:0 [inlined]
 [6] #read#10(::Bool, ::Function, ::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:69
 [7] read(::String) at /home/dasenbrj/.julia/packages/Feather/pbm3o/src/source.jl:69
 [8] top-level scope at none:0

Package versions etc:

  • Ubuntu 18.04.4 LTS
  • Julia Version 1.0.5
  • Julia Feather 0.5.6
  • Python 3.8.2
  • Python Feather 0.4.1
  • Python pyarrow 0.17.0

def-mycroft avatar May 17 '20 12:05 def-mycroft

This is because pyarrow now uses Feather V2, which is just the arrow IPC format written to disk (i.e. the metadata is completely different than feather V1).

I am now deep into a complete rewrite of the Arrow.jl package, which will support reading and writing Feather V2. This package will likely be moved into legacy mode and support only reading and writing Feather V1.

I have added a note to the README regarding this. I will change this when Arrow.jl is complete. I'll also make a post on the Julia discourse. It'll probably be another few weeks before I have unit tests and all and am ready for a release, but keep an eye out if you're still interested. I won't support everything in the arrow standard right out of the gate (it's quite extensive by now), but certainly simple dataframes like you show here will be supported initially.

ExpandingMan avatar May 17 '20 14:05 ExpandingMan

thanks for the note and work @ExpandingMan .

I'll leave this issue open for now so it is visible to others while the rewrite of Arrow.jl is in progress.

def-mycroft avatar May 17 '20 15:05 def-mycroft

Any news on this issue?

chrizMM avatar Jan 07 '22 19:01 chrizMM