duckdb-rs icon indicating copy to clipboard operation
duckdb-rs copied to clipboard

Insert support with data chunk or Arrow

Open mharmer opened this issue 4 years ago • 4 comments

I can't seem to find an interface available that would allow me to insert data from columnar data I have in memory.

For Arrow specifically, I see the following C interfaces in duckdb itself for Arrow support, but they appear to be used for reading data out, or used in small inserts where the data is in the SQL query itself: https://github.com/duckdb/duckdb/blob/c3ba7e5b/src/include/duckdb.h#L1805

I recently opened a question on this in duckdb (https://github.com/duckdb/duckdb/issues/3412) and closed it since there appeared to be support (possibly only through a C++ API with DataChunk). But, I'm thinking that since there doesn't appear to be a C API for writing to their DataChunk type that this might actually be necessary to support it from duckdb-rs.

I thought I might ask here since I think @wangfenjin has done a lot of the Arrow support in DuckDB, but I am using duckdb-rs as my primary interface.

mharmer avatar Apr 12 '22 16:04 mharmer

According to the issue you mentioned, they had support append datachunks

  • test example: https://github.com/duckdb/duckdb/blob/f44e77c7ec356ca6aee96dc84b0e00a6d3c6973a/test/api/capi/test_capi_data_chunk.cpp#L51
  • doc: https://duckdb.org/docs/api/c/data_chunk

We need to add API in appender.rs . I maintain this crate in my spare time, can't guarantee when it will be available. It would be great if you can help on this.

wangfenjin avatar Apr 13 '22 01:04 wangfenjin

If the inmemory data is arrow format, we may also choose to add api for table functions

  • https://github.com/duckdb/duckdb/blob/f44e77c7ec/src/function/table/arrow.cpp#L1111
  • https://duckdb.org/docs/api/c/table_functions
  • https://github.com/duckdb/duckdb/blob/5079e8e7f05057b10e97d0dd028a3e1d636c798b/src/include/duckdb.h#L1330

wangfenjin avatar Apr 13 '22 01:04 wangfenjin

I can definitely take a look at adding this. I had missed the crucial duckdb_vector_get_data from the Data Chunk API that returns a pointer that can be both read and written to, so I think it will cover what I need. Would this necessarily be implemented in the appender.rs though - the C API seems to have split these out sufficiently that maybe a new datachunk.rs might make more sense?

I do have an arrow format currently in memory, so I might try to tackle the table functions after the data chunks.

mharmer avatar Apr 13 '22 21:04 mharmer

Yes, you are free to create a new file for this

wangfenjin avatar Apr 14 '22 00:04 wangfenjin