XLSX.jl icon indicating copy to clipboard operation
XLSX.jl copied to clipboard

Cannot open file (failed to parse internal XML file)

Open tbeason opened this issue 10 months ago • 1 comments

I'm not quite sure what the problem is here, but I've encountered a file that I am not able to open! Luckily the file is public so maybe somebody can figure this out. Link: https://www.aqr.com/-/media/AQR/Documents/Insights/Data-Sets/Quality-Minus-Junk-Factors-Monthly.xlsx

With that file downloaded, I try this:

df = @chain begin
        XLSX.readtable("Quality-Minus-Junk-Factors-Monthly.xlsx", "QMJ Factors","A:AE";first_row=19,infer_eltypes=true)
        DataFrame
        transform("DATE"=>ByRow(d->Date(d,dateformat"m/d/Y"))=>"DATE")
        select("DATE","USA"=>"QMJ")
    end

But get the error


┌ Error: Failed to parse internal XML file `_rels/.rels`
└ @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:446
ERROR: EOFError: read end of file
Stacktrace:
  [1] read!
    @ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\Zlib.jl:299 [inlined]
  [2] unsafe_read(f::ZipFile.ReadableFile, p::Ptr{UInt8}, n::UInt64)
    @ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\ZipFile.jl:498
  [3] unsafe_read
    @ EzXML .\io.jl:774 [inlined]
  [4] (::EzXML.var"#7#8")(context::ZipFile.ReadableFile, buffer::Ptr{UInt8}, len::Int32)
    @ EzXML C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:218
  [5] macro expansion
    @ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\error.jl:50 [inlined]
  [6] readxml
    @ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:154 [inlined]
  [7] internal_xml_file_read(xf::XLSX.XLSXFile, filename::String)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:444
  [8] xmldocument
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:480 [inlined]
  [9] xmlroot
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:484 [inlined]
 [10] get_package_relationship_root(xf::XLSX.XLSXFile)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\relationship.jl:51
 [11] parse_relationships!(xf::XLSX.XLSXFile)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:296
 [12] open_or_read_xlsx(source::String, read_files::Bool, enable_cache::Bool, read_as_template::Bool)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:235
 [13] openxlsx(f::XLSX.var"#32#33"{Int64, Nothing, Bool, Bool, Bool, Nothing, Bool, String, String}, source::String; mode::String, enable_cache::Bool)
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:135
 [14] openxlsx
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:128 [inlined]
 [15] #readtable#31
    @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:611 [inlined]
 [16] top-level scope
    @ REPL[12]:2

Here is my Project.toml (I am on 1.10 beta 2)

  [336ed68f] CSV v0.10.11
  [13f3f980] CairoMakie v0.10.9
  [8be319e6] Chain v0.5.0
  [992eb4ea] CondaPkg v0.2.18
  [60f91f6f] CovarianceMatrices v0.10.4
  [a10d1c49] DBInterface v2.5.0
  [a93c6f00] DataFrames v1.6.1
  [d2f5444f] DuckDB v0.8.1
  [bd2a388e] FamaFrenchData v0.4.3
⌃ [38e38edf] GLM v1.8.3
  [5432bcbf] PaddedViews v0.5.12
  [6099a3de] PythonCall v0.9.14
  [cbe49d4c] RemoteFiles v0.5.0
⌅ [2913bbd2] StatsBase v0.33.21
⌅ [3eaba693] StatsModels v0.6.33
  [bd369af6] Tables v1.10.1
  [fdbf4ff8] XLSX v0.10.0
  [ade2ca70] Dates
  [f43a241f] Downloads v1.6.0
  [37e2e46d] LinearAlgebra
  [10745b16] Statistics v1.9.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`

If I open the file in Excel and Save As to a new file, everything works fine. I don't know how to do that programmatically though!

tbeason avatar Sep 18 '23 18:09 tbeason