Binding StringIO not supported
It would be great if you can also bind StringIO-objects, so that this code is possible:
require 'duckdb'
require 'stringio'
db = DuckDB::Database.open
csv_data = "id,name,age\n1,Alice,30\n2,Bob,25\n3,Charlie,35"
csv_io = StringIO.new(csv_data)
db.connect do |con|
con.query("CREATE TABLE test AS SELECT * FROM read_csv_auto(?, header = true)", csv_io)
end
At the moment it leads to the error:
... /3.3.0/gems/duckdb-1.2.0.0/lib/duckdb/prepared_statement.rb:288:in `bind_with_index': not supported type `#<StringIO:0x000001f1ceba5380>` (StringIO) (DuckDB::Error)
I don't know how to pass StringIO object to read_csv, read_csv_auto now. read_csv_auto(read_csv) might accept only string matching file path or URL...
$ ./build/release/duckdb
v1.2.1 8e52ec4395
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SELECT * from read_csv_auto('1,2,3');
IO Error:
No files found that match the pattern "1,2,3" #=> expect string '1,2,3' matching file path or URL?
In Python the following code works:
import duckdb
from io import StringIO
string = StringIO("c1,c2,c3\na,b,c")
duckdb.read_csv(string, header=True)
Gemini 2.5 provides the following information:
DuckDB's read_csv function is designed to read from various sources, including local files, remote URLs (like S3, HTTP), and also Python file-like objects (which io.StringIO is). To handle this variety of input sources in a standardized way, DuckDB relies on the fsspec library.
In your code you are providing a String '1,2,3', not an StringIO-object. Gemini provides the following answer:
Okay, you can't directly pass the raw CSV string inside the SQL query like you would use a StringIO object in Python, because read_csv_auto expects a file path or URL as its argument.
However, you can achieve the equivalent of providing in-memory data by using standard input (stdin) combined with shell redirection or piping. The DuckDB CLI can read from stdin if you specify a special path, typically /dev/stdin.
Ok, I meant that the following python code does not work, so I said that I don't know how to pass StringIO object to read_csv.
import duckdb
import io
s = io.StringIO('C1,C2\n1,2\n3,4\n5,6')
duckdb.sql('CREATE TABLE t AS SELECT * FROM read_csv_auto($1)', s)
But do you mean that you want read_csv_auto(read_csv) method like python?
requre 'duckdb'
db = DuckDB::Database.open
csv_data = "id,name,age\n1,Alice,30\n2,Bob,25\n3,Charlie,35"
csv_io = StringIO.new(csv_data)
db.connect do |con|
con.read_csv_auto(csv_io)
end
Essentially, read_csv_auto is a convenient shorthand for read_csv with the AUTO_DETECT option enabled.
If my understanding is correct, then the ruby-duckdb gem primarily acts as a wrapper around the DuckDB C++ library, exposing its functionality mainly through the execution of SQL queries (con.query(sql_stmts)). DuckDB's underlying C++ API does have advanced features like "Replacement Scans" (duckdb_add_replacement_scan) that allow client applications to provide custom data sources.
Theoretically, the ruby-duckdb gem could use this C++ feature to expose a way to read from a Ruby StringIO object without a temporary file.
It would be great, if the read_csv and read_csv_auto methods could handle a StringIO object like in your example.