sqlite-parquet-vtable
sqlite-parquet-vtable copied to clipboard
Support globbing for multiple parquet files?
Is there a way I can load multiple parquet files?
My first guess brings back the following IOError from Arrow:
CREATE VIRTUAL TABLE trips USING parquet('parquet/*');
Error: Arrow error: IOError: Failed to open local file: parquet/* , error: No such file or directory
For the record there are two parquet files in that folder.
$ ls -l parquet/00000*
-rw-rw-r-- 1 mark mark 2079571240 Jun 25 05:58 parquet/000000_0
-rw-rw-r-- 1 mark mark 2053401839 Jun 25 06:07 parquet/000001_0
I'd like to support this more seamlessly in the future, either by supporting a glob or, like Hive, taking a directory to query. There are some internal design things I'd have to think about first, though.
Until then, the best I can suggest is to create N tables, one per parquet file, then create a view that UNION ALLs the tables. If you invoke sqlite like sqlite3 mydb.db
, it will persist the view so that future invocations of sqlite don't need to recreate it.