JuliaDB.jl icon indicating copy to clipboard operation
JuliaDB.jl copied to clipboard

Convert an in memory table to an out of core table

Open Gyslain opened this issue 7 years ago • 4 comments
trafficstars

I'm merging JuliaDB tables (these tables come from a data flux) into a global ndsparse table. But I'm running out of memory so I plan to use a "chunked out-of-core" table. Is there a way to convert my table to an out of core table? The JuliaDB mention the loadtable function to create out of core table but it only reads csv files from disk.

Gyslain avatar Nov 19 '18 14:11 Gyslain

See ?distribute. I'll leave this issue open since I couldn't find it in the docs.

help?> distribute
search: distribute

  distribute(t::Table, chunks)

  Distribute a table in chunks pieces. Equivalent to table(t, chunks=chunks).

  ──────────────────────────────────────────────────────────────────────────────────────────────────────────

  distribute(itable::NDSparse, rowgroups::AbstractArray)

  Distributes an NDSparse object into a DNDSparse by splitting it up into chunks of rowgroups elements.
  rowgroups is a vector specifying the number of rows in the chunks.

  Returns a DNDSparse.

  ──────────────────────────────────────────────────────────────────────────────────────────────────────────

  distribute(itable::NDSparse, nchunks::Int=nworkers())

  Distributes an NDSparse object into a DNDSparse of nchunks chunks of approximately equal size.

  Returns a DNDSparse.

joshday avatar Nov 19 '18 14:11 joshday

The loadtable function has 2 options : chunk and output. The output option is used to store the table on disk. Whereas distribute has only the chunk option, so I guess the table remains in memory. Am I wrong?

Gyslain avatar Nov 19 '18 14:11 Gyslain

You're right, but you can then save it. My understanding is that the following are more-or-less equivalent

loadndsparse(path, chunks=4, output="outdir")

nd = loadndsparse(path)

dnd = distribute(nd, 4)

save(dnd, "outdir")

joshday avatar Nov 19 '18 15:11 joshday

Ok I see. After saving the chunked table, it may reside on the disk or in memory. Later, I can push new data into this table without worrying about memory. Thanks for your help, we can close this task.

Gyslain avatar Nov 20 '18 16:11 Gyslain