delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Problem writing tables in directory named with char `~`

Open bglezseoane opened this issue 2 years ago • 4 comments

Environment

Delta-rs version: Python deltalake 0.12.0. Binding: Python deltalake 0.12.0. Environment: Local, Python 3.11. OS: Mac OS.


Bug

What happened:

I am working in Mac OS local machine (ARM) and using Python deltalake binding for read and write Delta tables. I need to store some Delta tables in the directory synced with iCloud, which is named $HOME/Library/Mobile Documents/com~apple~CloudDocs. I have also a symlink in the home, $HOME/iCloud, pointing to this location, which I typically use instead of the actual full path.

When I try to use write_deltalake function, an error OSError: Encountered object with invalid path: Error parsing Path "/Users/***/Library/Mobile%20Documents/com~apple~CloudDocs/testing_deltalake/df": Encountered illegal character sequence "~" whilst parsing path segment "com~apple~CloudDocs" is raised. I am using the symlink in the write_deltalake call, so it seems that:

  1. The write_deltalake is resolving the symlink.
  2. The write_deltalakehas problems handling the ~ char.

Similar error applies using DeltaTable to recover a table.

What you expected to happen:

Apart from the fact that the directory syncs with iCloud and how canonical this may be using Delta tables, it is a fully valid path in this OS and I consider it should be possible to read and write tables to it. In fact, testing on any directory containing ~, the error persists.

I leave it up to the developers to decide whether symbolic links should be resolved or not, I lack the context to take a position on this.

How to reproduce it:

from tempfile import mkdtemp

import pandas as pd
from deltalake import write_deltalake

df = pd.DataFrame(
    {
        "a": [1, 2, 3, 4, 5],
        "fruits": ["banana", "orange", "mango", "apple", "banana"],
    }
)

# dest = Path().home() / "iCloud" / "testing_deltalake" / "df"
dest = mkdtemp(suffix="com~apple~CloudDocs")

write_deltalake(
    table_or_uri=dest,
    data=df,
)

Error:

{
	"name": "OSError",
	"message": "Encountered object with invalid path: Error parsing Path \"/private/var/folders/2h/xg9cljwj14d_nspp3fjk4pyw0000gn/T/tmpqh27vtamcom~apple~CloudDocs\": Encountered illegal character sequence \"~\" whilst parsing path segment \"tmpqh27vtamcom~apple~CloudDocs\"",
	"stack": "---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
Cell In[13], line 16
     13 # dest = Path().home() / \"iCloud\" / \"testing_deltalake\" / \"df\"
     14 dest = mkdtemp(suffix=\"com~apple~CloudDocs\")
---> 16 write_deltalake(
     17     table_or_uri=dest,
     18     data=df,
     19 )

File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/writer.py:153, in write_deltalake(table_or_uri, data, schema, partition_by, filesystem, mode, file_options, max_partitions, max_open_files, max_rows_per_file, min_rows_per_group, max_rows_per_group, name, description, configuration, overwrite_schema, storage_options, partition_filters, large_dtypes)
    150     else:
    151         data, schema = delta_arrow_schema_from_pandas(data)
--> 153 table, table_uri = try_get_table_and_table_uri(table_or_uri, storage_options)
    155 # We need to write against the latest table version
    156 if table:

File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/writer.py:417, in try_get_table_and_table_uri(table_or_uri, storage_options)
    414     raise ValueError(\"table_or_uri must be a str, Path or DeltaTable\")
    416 if isinstance(table_or_uri, (str, Path)):
--> 417     table = try_get_deltatable(table_or_uri, storage_options)
    418     table_uri = str(table_or_uri)
    419 else:

File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/writer.py:430, in try_get_deltatable(table_uri, storage_options)
    426 def try_get_deltatable(
    427     table_uri: Union[str, Path], storage_options: Optional[Dict[str, str]]
    428 ) -> Optional[DeltaTable]:
    429     try:
--> 430         return DeltaTable(table_uri, storage_options=storage_options)
    431     except TableNotFoundError:
    432         return None

File ~/.cache/pypoetry/virtualenvs/***-***-py3.11/lib/python3.11/site-packages/deltalake/table.py:250, in DeltaTable.__init__(self, table_uri, version, storage_options, without_files, log_buffer_size)
    231 \"\"\"
    232 Create the Delta Table from a path with an optional version.
    233 Multiple StorageBackends are currently supported: AWS S3, Azure Data Lake Storage Gen2, Google Cloud Storage (GCS) and local URI.
   (...)
    247 
    248 \"\"\"
    249 self._storage_options = storage_options
--> 250 self._table = RawDeltaTable(
    251     str(table_uri),
    252     version=version,
    253     storage_options=storage_options,
    254     without_files=without_files,
    255     log_buffer_size=log_buffer_size,
    256 )
    257 self._metadata = Metadata(self._table)

OSError: Encountered object with invalid path: Error parsing Path \"/private/var/folders/2h/xg9cljwj14d_nspp3fjk4pyw0000gn/T/tmpqh27vtamcom~apple~CloudDocs\": Encountered illegal character sequence \"~\" whilst parsing path segment \"tmpqh27vtamcom~apple~CloudDocs\""
}

bglezseoane avatar Nov 05 '23 19:11 bglezseoane

This is coming from object_store in Apache Arrow, maybe need to raise an issue there: https://github.com/apache/arrow-rs/blob/91acfb07a9929a2d6721c5417e47c0c472372a86/object_store/src/path/parts.rs#L91C15

r3stl355 avatar Nov 06 '23 19:11 r3stl355

Path safety has been relaxed in the most recent version of object_store 0.8

tustvold avatar Nov 08 '23 02:11 tustvold

Path safety has been relaxed in the most recent version of object_store 0.8

Any idea when the update might be propagated to this library?

bglezseoane avatar Nov 14 '23 23:11 bglezseoane

We're working on getting DataFusion updated currently, it should land at some point in the next few weeks

tustvold avatar Nov 15 '23 10:11 tustvold