CSV.jl
CSV.jl copied to clipboard
How can I delete a file after reading its content with CSV.Rows ?
I am trying to delete a file after reading its content but I am getting permission denied. This code
using CSV
using Random
Random.seed!(0)
open("test.csv", "w") do f
for _ in 1:100_000
Base.write(f, join([randstring('a':'z') for _ in 1:8], ","))
Base.write(f, "\n")
end
end
for r in CSV.Rows("test.csv")
#something
end
rm("test.csv")
produces
ERROR: IOError: unlink("test.csv"): permission denied (EACCES)
Probably the most reliable way is to do something like:
for r in CSV.Rows("test.csv")
# something
end
GC.gc(); GC.gc()
rm("test.csv")
the problem is that the file isnt' technically "released" until the CSV.Rows object gets gc-ed. This is somewhat of an open problem in Julia. We could make this a tad more formal by defining a finalizer function for CSV.Rows, but you'd still have to call finalize(rows) when you wanted the file "released".
well, I can totally call a finalizer, that does not bother me as a user. Currently, my attempt consisted in doing something like this.
rows_iterator = CSV.Rows("test.csv")
for r in rows_iterator
#somthing
end
rows_iterator = nothing
GC.gc()
yes, that should work; though sometimes you have to call GC.gc() twice in order to fully collect an object.
If we go the finalizer route, we'll just have to add some checks in other places like iterate to ensure a CSV.Rows is still "valid" and hasn't been finalized since that would lead to really bad scenarios.
If you give me a little guidance or a small sketch I will be happy to open a PR. :)
Alright, sorry for the slow response here, it might be a little hairy, but here's some guidance/sketch, though I'll admit I haven't thought this through all the way to the end (hence a sketch!):
- add a
finalized::Base.RefValue{Bool}field toRowsstruct (alternatively we could make thisfinalized::Threads.Atomic{Bool}if we're worried about thread safety) - Add an official
CSV.releaseinputuser-facing API function; this would setrows.finalized[] = true, and callfinalize(rows.ctx.buf), which should release the mmapping of the input file - Update the
Rowsiteratemethod to check if it's been finalized and if so, returnnothingor throw an error - Probably need to pass the
finalizedfield toRow2struct as well, and check if the input has been finalized ingetcolumn, though......maybe not, since we're saving the values in thevaluesfield. But if they'rePosLenvalues, then they would be invalid, because they just point into the original input, so yeah, I do think we'd need to check if the original buf is still valid ingetcolumnforPosLenvalues at least. This is probably the hairiest part where there could be corner cases. The thing we'd want to avoid is someone having a "row" (CSV.Row2), and then trying to use that row after theCSV.Rowsobject has hadCSV.releaseinputcalled on it andgetcolumndoing something invalid by interacting with a finalized buf.