stdlib
stdlib copied to clipboard
Could DuckDBClient load (CSV) files by URL ?
I'd like an equivalent of
gentoo = d3.csv("https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381", d3.autoType)
with DuckDBClient.
both
DuckDBClient.of({
gentoo: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
})
and
DuckDBClient.of({
gentoo: {
file: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
}
})
won't work.
But this, simulating a FileAttachment structure, will work:
db = {
const gentoo = {
url : () => "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381",
mimeType: 'text/csv',
name: 'gentoo'
}
return DuckDBClient.of({
gentoo: {file: gentoo}
})
}
although it's rather complicated to memorize.
I would dream of something simple and intuitive like:
DuckDBClient.of({
gentoo: {
url: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381",
fileType: "csv"
}
})
with fileType that could also be 'json' for instance (or 'parquet', 'arrow'...).
That sounds reasonable to me. 👍
In theory, we could also make a HEAD request for the file to get the MIME type, and then we might be able to make the type optional if the content-type response header is present. That might allow this:
DuckDBClient.of({
gentoo: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
})
Or this:
DuckDBClient.of({
gentoo: {
url: "https://portal.edirepository.org/nis/dataviewer?packageid=knb-lter-pal.220.7&entityid=e03b43c924f226486f2f0ab6709d2381"
}
})