delta-rs
delta-rs copied to clipboard
Problem writing tables in directory named with char `~`
Environment
Delta-rs version: Python deltalake 0.12.0.
Binding: Python deltalake 0.12.0.
Environment: Local, Python 3.11.
OS: Mac OS.
Bug
What happened:
I am working in Mac OS local machine (ARM) and using Python deltalake binding for read and write Delta tables. I need to store some Delta tables in the directory synced with iCloud, which is named $HOME/Library/Mobile Documents/com~apple~CloudDocs. I have also a symlink in the home, $HOME/iCloud, pointing to this location, which I typically use instead of the actual full path.
When I try to use write_deltalake function, an error OSError: Encountered object with invalid path: Error parsing Path "/Users/***/Library/Mobile%20Documents/com~apple~CloudDocs/testing_deltalake/df": Encountered illegal character sequence "~" whilst parsing path segment "com~apple~CloudDocs" is raised. I am using the symlink in the write_deltalake call, so it seems that:
- The
write_deltalakeis resolving the symlink. - The
write_deltalakehas problems handling the~char.
Similar error applies using DeltaTable to recover a table.
What you expected to happen:
Apart from the fact that the directory syncs with iCloud and how canonical this may be using Delta tables, it is a fully valid path in this OS and I consider it should be possible to read and write tables to it. In fact, testing on any directory containing ~, the error persists.
I leave it up to the developers to decide whether symbolic links should be resolved or not, I lack the context to take a position on this.
How to reproduce it:
from tempfile import mkdtemp
import pandas as pd
from deltalake import write_deltalake
df = pd.DataFrame(
{
"a": [1, 2, 3, 4, 5],
"fruits": ["banana", "orange", "mango", "apple", "banana"],
}
)
# dest = Path().home() / "iCloud" / "testing_deltalake" / "df"
dest = mkdtemp(suffix="com~apple~CloudDocs")
write_deltalake(
table_or_uri=dest,
data=df,
)
Error:
{
"name": "OSError",
"message": "Encountered object with invalid path: Error parsing Path \"/private/var/folders/2h/xg9cljwj14d_nspp3fjk4pyw0000gn/T/tmpqh27vtamcom~apple~CloudDocs\": Encountered illegal character sequence \"~\" whilst parsing path segment \"tmpqh27vtamcom~apple~CloudDocs\"",
"stack": "---------------------------------------------------------------------------
OSError Traceback (most recent call last)
Cell In[13], line 16
13 # dest = Path().home() / \"iCloud\" / \"testing_deltalake\" / \"df\"
14 dest = mkdtemp(suffix=\"com~apple~CloudDocs\")
---> 16 write_deltalake(
17 table_or_uri=dest,
18 data=df,
19 )
File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/writer.py:153, in write_deltalake(table_or_uri, data, schema, partition_by, filesystem, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, overwrite_schema, storage_options, partition_filters, large_dtypes)
150 else:
151 data, schema = delta_arrow_schema_from_pandas(data)
--> 153 table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
155 # We need to write against the latest table version
156 if table:
File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/writer.py:417, in try_get_table_and_table_uri(table_or_uri, storage_options)
414 raise ValueError(\"table_or_uri must be a str, Path or DeltaTable\")
416 if isinstance(table_or_uri, (str, Path)):
--> 417 table = try_get_deltatable(table_or_uri, storage_options)
418 table_uri = str(table_or_uri)
419 else:
File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/writer.py:430, in try_get_deltatable(table_uri, storage_options)
426 def try_get_deltatable(
427 table_uri: Union[str, Path], storage_options: Optional[Dict[str, str]]
428 ) -> Optional[DeltaTable]:
429 try:
--> 430 return DeltaTable(table_uri, storage_options=storage_options)
431 except TableNotFoundError:
432 return None
File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/table.py:250, in DeltaTable.__init__(self, table_uri, version, storage_options, without_files, log_buffer_size)
231 \"\"\"
232 Create the Delta Table from a path with an optional version.
233 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
(...)
247
248 \"\"\"
249 self._storage_options = storage_options
--> 250 self._table = RawDeltaTable(
251 str(table_uri),
252 version=version,
253 storage_options=storage_options,
254 without_files=without_files,
255 log_buffer_size=log_buffer_size,
256 )
257 self._metadata = Metadata(self._table)
OSError: Encountered object with invalid path: Error parsing Path \"/private/var/folders/2h/xg9cljwj14d_nspp3fjk4pyw0000gn/T/tmpqh27vtamcom~apple~CloudDocs\": Encountered illegal character sequence \"~\" whilst parsing path segment \"tmpqh27vtamcom~apple~CloudDocs\""
}
This is coming from object_store in Apache Arrow, maybe need to raise an issue there: https://github.com/apache/arrow-rs/blob/91acfb07a9929a2d6721c5417e47c0c472372a86/object_store/src/path/parts.rs#L91C15
Path safety has been relaxed in the most recent version of object_store 0.8
Path safety has been relaxed in the most recent version of object_store 0.8
Any idea when the update might be propagated to this library?
We're working on getting DataFusion updated currently, it should land at some point in the next few weeks