iceberg-python
iceberg-python copied to clipboard
Cannot specify "file://" uri or direct local file location for warehouse in Windows
Apache Iceberg version
0.7.0 (latest release)
Please describe the bug 🐞
When running PyIceberg in Windows, and using local file system as warehouse location, we cannot specify the location either via file:// uri, or absolute path that contains drive letters, e.g. C:.
The cause is due to the PyArrowFileIO.parse_location().
The following codes demonstrates the bug.
I have also prepared a Pull Request for the fix: https://github.com/apache/iceberg-python/pull/996
import tempfile
from pathlib import Path
import pyarrow as pa
from pyiceberg.catalog import load_catalog, Catalog
with tempfile.TemporaryDirectory() as tmpdir:
catalog = load_catalog(
**{
"type": "sql",
"uri": f"sqlite://", # in-memory sqlite
# "warehouse": Path(tmpdir).as_uri().replace("C:/", ""), # Successful
"warehouse": Path(tmpdir).as_uri(), # ERROR: OSError: [WinError 123] Failed querying information for path
# '/C:/Users/<redacted>>/AppData/Local/Temp/tmp9rjtnjm7/foobar.db/baz/
# metadata/00000-90a23d06-aeae-4dab-8e15-892f70c83578.metadata.json'.
# Detail: [Windows error 123] The filename, directory name, or volume
# label syntax is incorrect.
# "warehouse": tmpdir, # ERROR: ValueError: Unrecognized filesystem type in URI: c
},
)
namespace = "foobar"
table_name = "baz"
data = pa.Table.from_pydict({"a": [1, 2, 3], "b": [4, 5, 6]})
catalog.create_namespace(namespace)
identifier = Catalog.identifier_to_tuple(f"{namespace}.{table_name}")
table = catalog.create_table(identifier, schema=data.schema)
print(table)