XLSX.jl
XLSX.jl copied to clipboard
Cannot open file (failed to parse internal XML file)
I'm not quite sure what the problem is here, but I've encountered a file that I am not able to open! Luckily the file is public so maybe somebody can figure this out. Link: https://www.aqr.com/-/media/AQR/Documents/Insights/Data-Sets/Quality-Minus-Junk-Factors-Monthly.xlsx
With that file downloaded, I try this:
df = @chain begin
XLSX.readtable("Quality-Minus-Junk-Factors-Monthly.xlsx", "QMJ Factors","A:AE";first_row=19,infer_eltypes=true)
DataFrame
transform("DATE"=>ByRow(d->Date(d,dateformat"m/d/Y"))=>"DATE")
select("DATE","USA"=>"QMJ")
end
But get the error
┌ Error: Failed to parse internal XML file `_rels/.rels`
└ @ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:446
ERROR: EOFError: read end of file
Stacktrace:
[1] read!
@ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\Zlib.jl:299 [inlined]
[2] unsafe_read(f::ZipFile.ReadableFile, p::Ptr{UInt8}, n::UInt64)
@ ZipFile C:\Users\beasont\.julia\packages\ZipFile\evaHP\src\ZipFile.jl:498
[3] unsafe_read
@ EzXML .\io.jl:774 [inlined]
[4] (::EzXML.var"#7#8")(context::ZipFile.ReadableFile, buffer::Ptr{UInt8}, len::Int32)
@ EzXML C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:218
[5] macro expansion
@ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\error.jl:50 [inlined]
[6] readxml
@ XLSX C:\Users\beasont\.julia\packages\EzXML\ZNwhK\src\document.jl:154 [inlined]
[7] internal_xml_file_read(xf::XLSX.XLSXFile, filename::String)
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:444
[8] xmldocument
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:480 [inlined]
[9] xmlroot
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:484 [inlined]
[10] get_package_relationship_root(xf::XLSX.XLSXFile)
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\relationship.jl:51
[11] parse_relationships!(xf::XLSX.XLSXFile)
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:296
[12] open_or_read_xlsx(source::String, read_files::Bool, enable_cache::Bool, read_as_template::Bool)
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:235
[13] openxlsx(f::XLSX.var"#32#33"{Int64, Nothing, Bool, Bool, Bool, Nothing, Bool, String, String}, source::String; mode::String, enable_cache::Bool)
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:135
[14] openxlsx
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:128 [inlined]
[15] #readtable#31
@ XLSX C:\Users\beasont\.julia\packages\XLSX\we7i6\src\read.jl:611 [inlined]
[16] top-level scope
@ REPL[12]:2
Here is my Project.toml (I am on 1.10 beta 2)
[336ed68f] CSV v0.10.11
[13f3f980] CairoMakie v0.10.9
[8be319e6] Chain v0.5.0
[992eb4ea] CondaPkg v0.2.18
[60f91f6f] CovarianceMatrices v0.10.4
[a10d1c49] DBInterface v2.5.0
[a93c6f00] DataFrames v1.6.1
[d2f5444f] DuckDB v0.8.1
[bd2a388e] FamaFrenchData v0.4.3
⌃ [38e38edf] GLM v1.8.3
[5432bcbf] PaddedViews v0.5.12
[6099a3de] PythonCall v0.9.14
[cbe49d4c] RemoteFiles v0.5.0
⌅ [2913bbd2] StatsBase v0.33.21
⌅ [3eaba693] StatsModels v0.6.33
[bd369af6] Tables v1.10.1
[fdbf4ff8] XLSX v0.10.0
[ade2ca70] Dates
[f43a241f] Downloads v1.6.0
[37e2e46d] LinearAlgebra
[10745b16] Statistics v1.9.0
Info Packages marked with ⌃ and ⌅ have new versions available, but those with ⌅ are restricted by compatibility constraints from upgrading. To see why use `status --outdated`
If I open the file in Excel and Save As to a new file, everything works fine. I don't know how to do that programmatically though!