glaredb
glaredb copied to clipboard
Chore: Document how to connect to S3/GCS bucket from cli and python lib
Description
I'm not sure what it looks like right now.
import glaredb
# What do I put for storage options?
con = glaredb.connect('gs://bucket/path', storage_options=?)
Can just leave stuff in comments here and we can transfer over to docs whenever.
Here is a brief tour of the new options (from an actual IPython session):
In [1]: import glaredb
In [2]: con_1 = glaredb.connect(location="memory://") # connect to in-memory store
In [3]: con_1.sql("create table test_1 as values (1, 'one'), (2, 'two')")
Out[3]: Noop
In [4]: con_1.sql("select * from test_1").to_arrow()
Out[4]:
pyarrow.Table
column1: int64
column2: string
----
column1: [[1,2]]
column2: [["one","two"]]
In [5]: con_2 = glaredb.connect(location="../../data-dir"). # connect to local file system
In [6]: con_2.sql("create table test_2 as values (3, 'three'), (4, 'four')")
Out[6]: Noop
In [7]: con_2.sql("select * from test_2").to_arrow()
Out[7]:
pyarrow.Table
column1: int64
column2: string
----
column1: [[3,4]]
column2: [["three","four"]]
In [8]: !tree ../../data-dir
../../data-dir
└── databases
└── 00000000-0000-0000-0000-000000000000
├── tables
│ └── 20000
│ ├── _delta_log
│ │ ├── 00000000000000000000.json
│ │ └── 00000000000000000001.json
│ └── part-00001-f3852b16-ee29-403e-9926-e56d6689bbaa-c000.snappy.parquet
├── tmp
│ └── 11cbfd08-fa8c-47fb-abc4-8a2bee166220
└── visible
├── catalog.0
├── catalog.1
├── lease
└── metadata
8 directories, 7 files
In [9]: con_3 = glaredb.connect(
...: location="gs://glaredb-test-bucket/path/to/some/folder",
...: storage_options=dict(service_account_path="/tmp/fake-gcs-creds.json")
...: ) # connect to a fake GCS server, and use a specific path
In [10]: con_3.sql("create table test_3 as values (5, 'five'), (6, 'six')")
Out[10]: Noop
In [11]: con_3.sql("select * from test_3").to_arrow()
Out[11]:
pyarrow.Table
column1: int64
column2: string
----
column1: [[5,6]]
column2: [["five","six"]]
In [12]: !curl -s --insecure http://0.0.0.0:4443/storage/v1/b/glaredb-test-bucket/o | jq .
{
"kind": "storage#objects",
"items": [
{
"kind": "storage#object",
"name": "path/to/some/folder/databases/00000000-0000-0000-0000-000000000000/tables/20000/_delta_log/00000000000000000000.json",
"id": "glaredb-test-bucket/path/to/some/folder/databases/00000000-0000-0000-0000-000000000000/tables/20000/_delta_log/00000000000000000000.json",
"bucket": "glaredb-test-bucket",
"size": "1223",
"crc32c": "94JF+A==",
"md5Hash": "g/35a5HOkx0W0uaZmmSIMw==",
"etag": "\"g/35a5HOkx0W0uaZmmSIMw==\"",
"timeCreated": "2023-10-19T09:06:30.710542Z",
"updated": "2023-10-19T09:06:30.710552Z",
"generation": "1697706390710661"
},
...
]
}
In [13]: con_4 = glaredb.connect(
...: location="http://localhost:9000/glaredb-test-bucket/some/sub/directory",
...: storage_options={"access_key_id": "glaredb", "secret_access_key": "glaredb_test"}
...: ) # connect to a MinIO server to test the S3 object store family
In [14]: con_4.sql("create table test_4 as values (7, 'seven'), (8, 'eight')")
Out[14]: Noop
In [15]: con_4.sql("select * from test_4").to_arrow()
Out[15]:
pyarrow.Table
column1: int64
column2: string
----
column1: [[7,8]]
column2: [["seven","eight"]]
The storage_options
kwarg can be omitted, and then the required params will be inferred from the environment (GOOGLE_APPLICATION_CREDENTIALS
for GCP and AWS_ACCESS_KEY_ID
/AWS_SECRET_ACCESS_KEY
for S3)