Can the file writer provide an explicit flush interface?
Hi, I'm building a storage engine using Vortex as a file format. As the caller, I want to be able to control writer memory.
my scenario is as follows:
- In the same thread, opened multiple
vortex writers(VortexWriteOptions) - write the
arow::Recordbatcheto multiple writers - When a memory limit is hit (hard-code setting), I want to flush some of vortex writers without closing them.
So i guess a explicit flush interface is useful.
You should be able to use this function to create a push-based Writer. You can then flush your own Write object as you wish.
Does that help?
Note that this doesn't force a flush of any internally buffered arrays in the writer. The default strategy currently buffers up to 2MB per column to improve column locality in the output file. You may want to use a custom strategy to prevent this.
You should be able to use this function to create a push-based Writer. You can then flush your own
Writeobject as you wish.Does that help?
Note that this doesn't force a flush of any internally buffered arrays in the writer. The default strategy currently buffers up to 2MB per column to improve column locality in the output file. You may want to use a custom strategy to prevent this. Thanks for replay
my example is:
let chunk1 =
StructArray::from_fields(&[("numbers", buffer!(1u32; 1024 * 1024 * 16).into_array())])?.into_array();
let chunk2 =
StructArray::from_fields(&[("numbers", buffer!(2u32; 1024 * 1024 * 16).into_array())])?.into_array();
let chunk3 =
StructArray::from_fields(&[("numbers", buffer!(3u32; 1024 * 1024 * 16).into_array())])?.into_array();
let dtype = chunk1.dtype().clone();
let mut buf = ByteBufferMut::empty();
{
let mut writer = VortexWriteOptions::default()
.blocking::<CurrentThreadRuntime>()
.writer(&mut buf, dtype.clone());
writer.push(chunk1)?;
// <------i want call `write.flush()?` here
writer.push(chunk2)?;
writer.push(chunk3)?;
let _ = writer.finish()?;
}
You can then flush your own Write object as you wish. Question is: the Write will be the inner object of writer, how can i call the flush in Write?
That's a very good point... It's not possible with our current logic to grab hold of the writer in the middle to flush it.
We may need to change the writer to be push-based by default, and spawn a task to push a stream of arrays into it. Instead of defaulting to a stream-based and trying to wrap that up in a push-based API.
I wonder if @onursatici has any thoughts?