duckdb-node icon indicating copy to clipboard operation
duckdb-node copied to clipboard

support for reading csvs from a stream

Open willm opened this issue 1 year ago • 2 comments

The cli supports reading csv data from stdin, I'm wondering if there's similar support for reading from a node js stream to avoid having to write a csv file to disk first?

willm avatar Nov 23 '24 08:11 willm

I'm assuming you'd have to do something similar to the httpfs extension and override the OpenFile method?

willm avatar Nov 23 '24 10:11 willm

Another approach to avoid having to write more C++ would be to use the httpfs extension and actually start a server during the import process and pipe your csv stream to the response stream. This works, but feels a bit hacky. See this example using the fast-csv library.

const duckdb = require("duckdb");
const {format} = require("@fast-csv/format");
const db = new duckdb.Database(":memory:");
const {createServer} = require("http");
const con = db.connect();

con.run(`CREATE TABLE product (name VARCHAR);`);

const server = createServer((req, res) => {
  const csvStream = format({
    delimiter: ",",
    headers: ["name"],
  });
  if (req.method === "HEAD") {
    res.writeHead(200, {});
    res.end();
    return;
  }
  if (req.url === "/csv") {
    res.writeHead(200, {"Content-Type": "text/csv"});
    req.pipe(res);
    csvStream.pipe(res);
    csvStream.write(["test"]);
    csvStream.end();
  }
});
server.listen(4444);

con.run(
  `
INSERT INTO product (name)
SELECT  name
FROM read_csv('http://localhost:4444/csv', header = true, delim = ',', columns = {
  'name': 'VARCHAR'
})
`,
  (err) => {
    if (err) {
      throw err;
    }
    server.close();
  }
);

willm avatar Nov 24 '24 21:11 willm