Mmap doesn't properly release a file after use (in Windows)
using Mmap
fname = tempname()
write(fname, "bar")
function reading()
io = open(fname)
mm = Mmap.mmap(io)
q = view(mm, 1:2)
close(io)
end
reading()
isfile(fname) && rm(fname)
gives
ERROR: LoadError: IOError: unlink("C:\\Users\\nzimm\\AppData\\Local\\Temp\\jl_g1GveD8R03"): permission denied (EACCES)
Stacktrace:
[1] uv_error
@ .\libuv.jl:106 [inlined]
[2] unlink(p::String)
[1] uv_error
@ .\libuv.jl:106 [inlined]
[2] unlink(p::String)
@ .\libuv.jl:106 [inlined]
[2] unlink(p::String)
[2] unlink(p::String)
@ Base.Filesystem .\file.jl:1105
[3] rm(path::String; force::Bool, recursive::Bool)
@ Base.Filesystem .\file.jl:283
[4] rm(path::String)
@ Base.Filesystem .\file.jl:273
[5] top-level scope
@ C:\Users\nzimm\github\ZipArchives.jl\mmap.jl:16
in expression starting at C:\Users\nzimm\github\ZipArchives.jl\mmap.jl:16
Inserting GC.gc() between the call to reading() and the file deletion fixes the issue.
GC is an expensive call. Assuming mmap doesn't need everything GC does, can mmap (or close) do what ever is needed for itself so the GC call becomes unnecessary?
This issue is discussed in the Discourse here where it is reported that it doesn't arise in Linux.
Some things that come to mind, maybe helpful:
-
Have you tried passing a closure to
open, as suggested on Discourse? -
Have you tried setting
sharedto false, assuming that's OK for your application? -
Hypotetically speaking, this might turn out to be an issue that's not fixable in Julia itself, for example maybe it's a libuv issue or just an element of how Windows is designed. That said, if it is a libuv issue, that should be fixable, sooner or later.
Relevant code:
https://github.com/JuliaLang/julia/blob/a23ce4bcbbb06be413169424cbffee0718b8a431/stdlib/Mmap/src/Mmap.jl#L254-L265
close is an operation on the IO, but the kernel doesn't care about that after creating the mmap. If you don't like this behavior, use a better kernel (not NT family) or a better way of reading files (such as read)
Have you tried passing a closure to open, as suggested on Discourse?
I don't clearly understand what a closure is, but I have done what I think the example you've linked shows. Putting the mmap call into a function is what allows the GC call (outside the function) to work, but it is not enough on its own, without the GC call.
Have you tried setting shared to false, assuming that's OK for your application?
No. I'll check that out now.
Hypotetically speaking, this might turn out to be an issue that's not fixable in Julia itself, for example maybe it's a libuv issue or just an element of how Windows is designed. That said, if it is a libuv issue, that should be fixable, sooner or later.
My naive assumption is that if GC can do it, then it is possible in Julia. Perhaps some small part of whatever GC does can be extracted and used in mmap, perhaps as an API call.
close is an operation on the IO, but the kernel doesn't care about that after creating the mmap.
OK, so close isn't part of the solution.
If you don't like this behavior, use a better kernel (not NT family)
Is there a way for the package I'm contributing to to switch to a better kernel if the user of the package is on a Windows system?
or a better way of reading files (such as read)
I'm trying to use mmap to support accessing data in files too big for memory. This is an exceptional case and, by default, the package I'm working on (XLSX.jl, not my package) does usually use read. The package itself uses ZipArchives, and mmap is what ZipArchives offers for this purpose. Is there another mechanism you'd recommend?
Have you tried setting shared to false, assuming that's OK for your application?
No. I'll check that out now.
So I have now tried this. It didn't do the trick and I still needed a GC call to get things working.
Sounds like an implementation or design problem of ZipArchives stream. Have you asked them about removing the mmap hack in favor of a proper streaming API? It looks like README currently notes that the inability to handle large files is one of the limitations of that package vs its predecessor ZipFile?
I can't speak for the ZipArchives author, of course, but in the discourse thread referred to above they have indicated that they will investigate a streaming API.
I read the ZipArchives README to mean the exact opposite of your interpretation:
Currently, ZipArchives has the following benefits over ZipFile:
Full ZIP64 support: archives larger than 4GB can be written.
Sounds like an implementation or design problem of ZipArchives stream.
The issue reported in the OP does not refer to ZipArchives at all.
I am closing this, as this is mostly an inherent limitation of mmap on Windows.
Currently, DiskArrays.jl can be used as described in https://discourse.julialang.org/t/struggling-to-use-mmap-with-ziparchives/129839/19
There, reading is done with:
function DiskArrays.readblock!(a::SimpleFileDiskArray,aout,i::AbstractUnitRange)
open(a.file) do f
seek(f,first(i)-1)
read!(f,aout)
end
end
So the file is only opened while it is being read, which might be what you want on Windows.
I'll add this as a suggestion to the ZipArchives.jl README, and try and improve its performance, Ref: https://github.com/JuliaIO/ZipArchives.jl/issues/92
I'll also add a comparison to the streaming API for reading .ZIP files in https://github.com/reallyasi9/ZipStreams.jl