Insert support with data chunk or Arrow
I can't seem to find an interface available that would allow me to insert data from columnar data I have in memory.
For Arrow specifically, I see the following C interfaces in duckdb itself for Arrow support, but they appear to be used for reading data out, or used in small inserts where the data is in the SQL query itself: https://github.com/duckdb/duckdb/blob/c3ba7e5b/src/include/duckdb.h#L1805
I recently opened a question on this in duckdb (https://github.com/duckdb/duckdb/issues/3412) and closed it since there appeared to be support (possibly only through a C++ API with DataChunk). But, I'm thinking that since there doesn't appear to be a C API for writing to their DataChunk type that this might actually be necessary to support it from duckdb-rs.
I thought I might ask here since I think @wangfenjin has done a lot of the Arrow support in DuckDB, but I am using duckdb-rs as my primary interface.
According to the issue you mentioned, they had support append datachunks
- test example: https://github.com/duckdb/duckdb/blob/f44e77c7ec356ca6aee96dc84b0e00a6d3c6973a/test/api/capi/test_capi_data_chunk.cpp#L51
- doc: https://duckdb.org/docs/api/c/data_chunk
We need to add API in appender.rs . I maintain this crate in my spare time, can't guarantee when it will be available. It would be great if you can help on this.
If the inmemory data is arrow format, we may also choose to add api for table functions
- https://github.com/duckdb/duckdb/blob/f44e77c7ec/src/function/table/arrow.cpp#L1111
- https://duckdb.org/docs/api/c/table_functions
- https://github.com/duckdb/duckdb/blob/5079e8e7f05057b10e97d0dd028a3e1d636c798b/src/include/duckdb.h#L1330
I can definitely take a look at adding this. I had missed the crucial duckdb_vector_get_data from the Data Chunk API that returns a pointer that can be both read and written to, so I think it will cover what I need. Would this necessarily be implemented in the appender.rs though - the C API seems to have split these out sufficiently that maybe a new datachunk.rs might make more sense?
I do have an arrow format currently in memory, so I might try to tackle the table functions after the data chunks.
Yes, you are free to create a new file for this