ruby-duckdb icon indicating copy to clipboard operation
ruby-duckdb copied to clipboard

Binding StringIO not supported

Open pere73 opened this issue 9 months ago • 4 comments

It would be great if you can also bind StringIO-objects, so that this code is possible:

require 'duckdb'
require 'stringio'

db = DuckDB::Database.open

csv_data = "id,name,age\n1,Alice,30\n2,Bob,25\n3,Charlie,35"
csv_io = StringIO.new(csv_data)

db.connect do |con|
  con.query("CREATE TABLE test AS SELECT * FROM read_csv_auto(?, header = true)", csv_io)
end

At the moment it leads to the error:

... /3.3.0/gems/duckdb-1.2.0.0/lib/duckdb/prepared_statement.rb:288:in `bind_with_index': not supported type `#<StringIO:0x000001f1ceba5380>` (StringIO) (DuckDB::Error)

pere73 avatar Mar 19 '25 09:03 pere73

I don't know how to pass StringIO object to read_csv, read_csv_auto now. read_csv_auto(read_csv) might accept only string matching file path or URL...

$ ./build/release/duckdb
v1.2.1 8e52ec4395
Enter ".help" for usage hints.
Connected to a transient in-memory database.
Use ".open FILENAME" to reopen on a persistent database.
D SELECT * from read_csv_auto('1,2,3');
IO Error:
No files found that match the pattern "1,2,3" #=> expect string '1,2,3' matching file path or URL?

suketa avatar Mar 28 '25 22:03 suketa

In Python the following code works:

import duckdb
from io import StringIO

string = StringIO("c1,c2,c3\na,b,c")
duckdb.read_csv(string, header=True)

Gemini 2.5 provides the following information:

DuckDB's read_csv function is designed to read from various sources, including local files, remote URLs (like S3, HTTP), and also Python file-like objects (which io.StringIO is). To handle this variety of input sources in a standardized way, DuckDB relies on the fsspec library.

In your code you are providing a String '1,2,3', not an StringIO-object. Gemini provides the following answer:

Okay, you can't directly pass the raw CSV string inside the SQL query like you would use a StringIO object in Python, because read_csv_auto expects a file path or URL as its argument.

However, you can achieve the equivalent of providing in-memory data by using standard input (stdin) combined with shell redirection or piping. The DuckDB CLI can read from stdin if you specify a special path, typically /dev/stdin.

pere73 avatar Mar 29 '25 12:03 pere73

Ok, I meant that the following python code does not work, so I said that I don't know how to pass StringIO object to read_csv.

import duckdb
import io
s = io.StringIO('C1,C2\n1,2\n3,4\n5,6')
duckdb.sql('CREATE TABLE t AS SELECT * FROM read_csv_auto($1)', s)

But do you mean that you want read_csv_auto(read_csv) method like python?

requre 'duckdb'

db = DuckDB::Database.open

csv_data = "id,name,age\n1,Alice,30\n2,Bob,25\n3,Charlie,35"
csv_io = StringIO.new(csv_data)

db.connect do |con|
  con.read_csv_auto(csv_io)
end

suketa avatar Mar 29 '25 20:03 suketa

Essentially, read_csv_auto is a convenient shorthand for read_csv with the AUTO_DETECT option enabled.

If my understanding is correct, then the ruby-duckdb gem primarily acts as a wrapper around the DuckDB C++ library, exposing its functionality mainly through the execution of SQL queries (con.query(sql_stmts)). DuckDB's underlying C++ API does have advanced features like "Replacement Scans" (duckdb_add_replacement_scan) that allow client applications to provide custom data sources.

Theoretically, the ruby-duckdb gem could use this C++ feature to expose a way to read from a Ruby StringIO object without a temporary file.

It would be great, if the read_csv and read_csv_auto methods could handle a StringIO object like in your example.

pere73 avatar Mar 30 '25 12:03 pere73