JuliaDB.jl
JuliaDB.jl copied to clipboard
Convert an in memory table to an out of core table
I'm merging JuliaDB tables (these tables come from a data flux) into a global ndsparse table. But I'm running out of memory so I plan to use a "chunked out-of-core" table. Is there a way to convert my table to an out of core table? The JuliaDB mention the loadtable function to create out of core table but it only reads csv files from disk.
See ?distribute. I'll leave this issue open since I couldn't find it in the docs.
help?> distribute
search: distribute
distribute(t::Table, chunks)
Distribute a table in chunks pieces. Equivalent to table(t, chunks=chunks).
──────────────────────────────────────────────────────────────────────────────────────────────────────────
distribute(itable::NDSparse, rowgroups::AbstractArray)
Distributes an NDSparse object into a DNDSparse by splitting it up into chunks of rowgroups elements.
rowgroups is a vector specifying the number of rows in the chunks.
Returns a DNDSparse.
──────────────────────────────────────────────────────────────────────────────────────────────────────────
distribute(itable::NDSparse, nchunks::Int=nworkers())
Distributes an NDSparse object into a DNDSparse of nchunks chunks of approximately equal size.
Returns a DNDSparse.
The loadtable function has 2 options : chunk and output. The output option is used to store the table on disk. Whereas distribute has only the chunk option, so I guess the table remains in memory. Am I wrong?
You're right, but you can then save it. My understanding is that the following are more-or-less equivalent
loadndsparse(path, chunks=4, output="outdir")
nd = loadndsparse(path)
dnd = distribute(nd, 4)
save(dnd, "outdir")
Ok I see. After saving the chunked table, it may reside on the disk or in memory. Later, I can push new data into this table without worrying about memory. Thanks for your help, we can close this task.