JSON.jl icon indicating copy to clipboard operation
JSON.jl copied to clipboard

use_mmap leaves file open

Open awf opened this issue 9 years ago • 8 comments

Calling parsefile with use_mmap=true leaves the file open until the finalizer for the result of mmap_array is called. This can be some time later, so the file typically can't immediately be written.

The fix is messy (not tested, just outline)

function parsefile(filename::AbstractString; ordered::Bool=false, use_mmap=true)
    sz = filesize(filename)
    open(filename) do io
        a = nothing
        if use_mmap
            a = mmap_array(Uint8, (sz,), io)
            s = UTF8String(a)
        else
            s = readall(io)
        end
        out = JSON.parse(s, ordered=ordered)
        finalize(a)
        out
    end
end

so I wonder what value mmap is providing here, given that the array is converted to a string immediately anyway?

awf avatar Jul 05 '15 07:07 awf

I don't understand the context of this issue. You say this is a fix but you don't show the code that would be replaced.

First, UTF8String does not copy the array, it merely wraps it.

Second, the openfile(...) do io ... end idiom does close the file, I believe.

dmbates avatar Jul 05 '15 14:07 dmbates

The issue is not that the file object is kept open, it's that the file is kept mmap'd, which apparently causes issue on windows.

I don't think calling finalize is a good idea though.

yuyichao avatar Jul 05 '15 14:07 yuyichao

See https://github.com/JuliaLang/julia/issues/12011

nalimilan avatar Jul 05 '15 14:07 nalimilan

@nalimilan Yes, pulling the fix from that thread back to here as it is currently an issue in JSON parsefile that means that if I call parsefile with default arguments, I cannot write to the file for some unknown amount of time. In my case, running in a notebook,I have needed to restart the kernel in order to write the file. It is a common pattern to read a file, do something, and write the same file, so this is a debilitating feature of the current implementation.

@yuyichao "Kept open" vs "kept mmapped" may be a distinction, but the outcome is the same.

@yuyichao I don't understand what you mean by "calling finalize is a [bad] idea". If you mean it might be slow, then that's easily trumped by correctness. If there are other potential difficulties, please state them to see if they can be fixed.

Bottom line: existing code leaks a filehandle or something like it. The do pattern doesn't prevent it. Options to fix: ignore use_mmap on Windows, or finalize as above. Final option: add a comment like that on readdlm saying "this doesn't work on windows, set use_mmap to true only if you don't care about leaving the file unwriteable". I personally find that quite distasteful.

awf avatar Jul 06 '15 10:07 awf

@dmbates Apologies, I'm new to Julia and to the conventions for issue management. I'm not proposing a patch, just getting feedback on how people would like to handle this issue. The code that is being replaced is the current function parsefile, defined in JSON.jl, i.e.

function parsefile(filename::AbstractString; ordered::Bool=false, use_mmap=true)
    sz = filesize(filename)
    open(filename) do io
        s = use_mmap ? UTF8String(mmap_array(Uint8, (sz,), io)) : readall(io)
        JSON.parse(s, ordered=ordered)
    end
end

awf avatar Jul 06 '15 10:07 awf

@yuyichao "Kept open" vs "kept mmapped" may be a distinction, but the outcome is the same.

I know. I'm just trying to explain what @dmbates might be confused about.

I don't understand what you mean by "calling finalize is a [bad] idea". If you mean it might be slow, then that's easily trumped by correctness.

More or less, but I think a better way would be to fix the interface of mmap'ed arrays to give a better finalization interface.

yuyichao avatar Jul 06 '15 12:07 yuyichao

Has there been any progress in finalization of mmap'ed arrays in Julia since 2015?

kmsquire avatar May 18 '19 23:05 kmsquire

I was visiting these pages in order to report that on Windows using JSON.parsefile leaves the file locked, but then noticed that the issue had been raised back in 2015. It was good to learn that passing the use_mmap argument as false is a work-around (which I confirmed with some quick tests) but this feature is indeed quite debilitating on Windows, as @awf remarked.

PGS62 avatar Nov 01 '19 13:11 PGS62