ZipFile.jl
ZipFile.jl copied to clipboard
SystemError seek: Bad file descriptor
julia> using ZipFile
julia> io = ZipFile.Reader("test/gml/polblogs.zip").files[1]
ZipFile.ReadableFile(name=polblogs.gml, method=Deflate, uncompresssedsize=977839, compressedsize=93369, mtime=1.156468828e9)
julia> readline(io)
"Creator \"Lada Adamic on Tue Aug 15 2006\"\n"
julia> readline(io)
ERROR: SystemError: seek: Bad file descriptor
in seek at ./iostream.jl:49
in read at /home/andrew/.julia/v0.4/ZipFile/src/ZipFile.jl:410
in readuntil at io.jl:174
in readuntil at io.jl:156
in readline at io.jl:217
the file is from http://www-personal.umich.edu/%7Emejn/netdata/polblogs.zip
if i do zmore or similar at the command line it has plenty more lines.
am i doing something dumb or is this an issue in your library? i was hoping it would a simple IO instance i could treat like a file (including rewind).
thanks.
edit:
_ _ _(_)_ | A fresh approach to technical computing
(_) | (_) (_) | Documentation: http://docs.julialang.org
_ _ _| |_ __ _ | Type "help()" for help.
| | | | | | |/ _` | |
| | |_| | | | (_| | | Version 0.4.0-dev+5928 (2015-07-12 04:57 UTC)
_/ |\__'_|_|_|\__'_| | Commit a9e0dd2 (5 days old master)
|__/ | x86_64-suse-linux
ZipFile was latest (Pkg.update()) at time of posting.
[edit2: cut + paste header from wrong julia - this was with 0.4 trunk, as updated above]
works fine on 0.3 with the same file / machine by the way.
well, updating julia to the latest from git seemed to fix this, so i guess it was a problem with trunk 5 days ago!
spoke too soon! it now occurs after some random number of readlines, around 2000 or 3000.
(and so does 0.3 if you wait long enough!)
(that's 0.3 from git, not a released 0.3)
I'm getting the same error with this, from Immerse.jl
:
const testdir = splitdir(@__FILE__)[1]
const facesdir = joinpath(testdir, "orl_faces")
const orl_url = "http://www.cl.cam.ac.uk/Research/DTG/attarchive/pub/data/att_faces.zip"
function unzip(inputfilename, outputpath=pwd())
r = ZipFile.Reader(inputfilename)
for f in r.files
outpath = joinpath(outputpath, f.name)
if isdirpath(outpath)
mkpath(outpath)
else
open(outpath, "w") do io
write(io, read(f))
end
end
end
nothing
end
julia> unzip(fn, facesdir)
ERROR: SystemError: seek: Bad file descriptor
Stacktrace:
[1] #systemerror#39(::Nothing, ::Function, ::String, ::Bool) at ./error.jl:106
[2] systemerror at ./error.jl:106 [inlined]
[3] seek(::IOStream, ::Int64) at ./iostream.jl:101
[4] read(::ZipFile.ReadableFile, ::Array{UInt8,1}) at /Users/jbieler/.julia/packages/ZipFile/02Psc/src/ZipFile.jl:452
[5] read at /Users/jbieler/.julia/packages/ZipFile/02Psc/src/iojunk.jl:11 [inlined]
[6] readbytes!(::ZipFile.ReadableFile, ::Array{UInt8,1}, ::Int64) at ./io.jl:813
[7] read at ./io.jl:836 [inlined]
[8] read(::ZipFile.ReadableFile) at ./io.jl:835
[9] (::getfield(Main, Symbol("##23#24")))(::IOStream) at ./REPL[56]:9
[10] #open#298(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(Main, Symbol("##23#24")), ::String, ::Vararg{String,N} where N) at ./iostream.jl:369
[11] open at ./iostream.jl:367 [inlined]
[12] unzip(::String, ::String) at ./REPL[56]:8
[13] top-level scope at none:0
Anything wrong here ?
I have the same problem with v0.8.1, it seems to be related to the finalizer
in ZipFile.Reader,
once the Reader reference goes out of scope, the zip file IO stream will be closed and seek fails
this workaround works for me in a similar function:
global r = ZipFile.Reader(inputfilename)
It seems we have to explicitly close the file.
Would it be possible for Zipfile to support the do
block like a normal file IO operation ? e.g.:
ZipFile.Writer("example.zip") do w
f1 = ZipFile.addfile(w, "file1.txt");
write(f1, "hello world!\n");
end
I just ran into this too.
I was about to compare runtimes of unzipping versus reading from the zip.
Especially when using @btime
the error occurs.¨
The global approach works, but seems suboptimal.
MWE
using CSV
using DataFrames
using ZipFile
src = raw"https://www.stats.govt.nz/assets/Uploads/Electronic-card-transactions/Electronic-card-transactions-June-2020/Download-data/electronic-card-transactions-june-2020-csv-tables.zip"
function readfromzip(zipFile,csvSep)
z = ZipFile.Reader(zipFile)
zippedcsv = filter(x->splitext(x.name)[2]==".csv",z.files)[1]
aDf = CSV.read(read(zippedcsv),DataFrame,delim=csvSep,copycols=true,pool=false,lazystrings=true);
return aDf
end
function unzipandread(zipFile,csvSep)
outputFolder = mktempdir()
cmd=`7z e $(zipFile) \*.csv -o$(outputFolder)`
read(cmd)
fi=readdir(outputFolder,join=true)[1]
aDf = CSV.read(fi,DataFrame,delim=csvSep,copycols=true,pool=false,lazystrings=true);
return aDf
end
zipFile = download(src);
csvSep=','
@time d1 = unzipandread(zipFile,csvSep);
@time d2 = readfromzip(zipFile,csvSep);
@assert isequal(d1,d2)
@btime unzipandread(zipFile,',');
@btime readfromzip(zipFile,',');
#either @time or @btime of readfromzip throws this error
ERROR: SystemError: seek: Bad file descriptor
Stacktrace:
[1] systemerror(::String, ::Int32; extrainfo::Nothing) at .\error.jl:168
[2] #systemerror#50 at .\error.jl:167 [inlined]
[3] systemerror at .\error.jl:167 [inlined]
[4] seek(::IOStream, ::Int64) at .\iostream.jl:108
[5] read(::ZipFile.ReadableFile, ::Type{UInt8}) at C:\Users\me\.julia\packages\ZipFile\AwgTV\src\ZipFile.jl:488
[6] readbytes!(::ZipFile.ReadableFile, ::Array{UInt8,1}, ::Int64) at .\io.jl:889
[7] read at .\io.jl:912 [inlined]
[8] read at .\io.jl:911 [inlined]
[9] readfromzip(::String, ::Char) at .\REPL[29]:4
[10] top-level scope at .\util.jl:175
versioninfo()
julia> versioninfo()
Julia Version 1.4.2
Commit 44fa15b150* (2020-05-23 18:35 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i9-7980XE CPU @ 2.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
JULIA_NUM_THREADS = 16
JULIA_EDITOR = "C:\Program Files\Microsoft VS Code\Code.exe"
I believe I'm facing this issue as well. First you need to keep the dir
object assigned before you do the "read", you can't just keep one of the file objects. Furthermore, this dir
object apparently has to be in the global scope.
I just ran into this - has there ever been a reliable workaround? I'm doing this:
for zf ∈ zip_files
rf = only(ZipFile.Reader(zf).files)
if rf.name ∉ readdir(zip_folder_path) # Check whether file exists already to avoid duplication
read_file = read(rf)
target_dir = normpath(zip_folder_path, rf.name)
write(target_dir, read_file)
end
end
This randomly errors for some files, but when I run it multiple times I'll eventually get through the whole list of files (I've got 86 zip files in the directory), i.e. there aren't any "bad" files in there.
I could of course just add a try/catch
and then wrap the whole loop in a for _ in 1:50; ...; end
loop which hopefully successfully unzips all files, but that seems a bit brittle...
your suggested try/catch loop could be improved with an while loop where you check for success for each file to be unzipped. Still brittle, but no need to loop to 50 :)
Just ran into this. I think the problem is the underlying file gets closed if Julia garbage-collects the reader. To ensure that doesn't happen you can use a pattern like this:
rdr = ZipFile.Reader(filename)
# ... do stuff with rdr.files
close(rdr)
The last call to close
is very important as referencing the reader here prevents the compiler/garbage collector from trashing the reader before we get here, as it knows we will still need it at this point for the close call. If you are opening a ZipFile with an existing io object, close() will be a no-op but I think it should still prevent gc.
I think this could be fixed on the ZipFile side by having a reference to the original Reader inside ReadableFile (and similarly for Writer/WritableFile). That way the Reader can't be gc'ed while there are still ReadableFile instances in scope.
So, is this because the garbage collector throws away the original ZipFile.Reader object? If that's the case, referring to the object in the global scope will be a workaround.
I'm using julia 1.9.2 and ZipFile v0.10.1 .
Here is my sample code that crashes with "seek: Bad file descriptor" while reading from a large-ish file in the zip archive:
using ZipFile
function openzipstream()
r = ZipFile.Reader("tmp.zip")
display(r.files)
return r.files[2]
end
function printout()
is = openzipstream()
println(readline(is))
cnt = 1
for line in eachline(is)
println("$(cnt): $(line)")
cnt += 1
end
end
printout()