semantic-link-labs
semantic-link-labs copied to clipboard
Bug: optimize_lakehouse_tables function fails on tables with deletion vectors in pure Python notebooks
Describe the bug
Running the lakehouse.optimize_lakehouse_tables function on tables with deletion vectors enabled return an error in Fabric pure Python notebook.
To Reproduce Steps to reproduce the behavior:
- Create Delta Table in Fabric Lakehouse with deletion vectors enabled.
- Create pure python (not pyspark) notebook
- run
%pip install semantic-link-labs --upgrade - restart python kernel
-
import sempy_labsand run thelakehouse.optimize_lakehouse_tablesfunction on the table with deletion vectors enabled. - Python notebook will return the following error:
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log.<locals>.get_wrapper.<locals>.log_decorator_wrapper(*args, **kwargs)
368 start_time = time.perf_counter()
370 try:
--> 371 result = func(*args, **kwargs)
373 # The invocation for get_message_dict moves after the function
374 # so it can access the state after the method call
375 message.update(extractor.get_completion_message_dict(result, arg_dict))
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/lakehouse/_lakehouse.py:100, in optimize_lakehouse_tables(tables, lakehouse, workspace)
98 path = r["Location"]
99 bar.set_description(f"Optimizing the '{table_name}' table...")
--> 100 _optimize_table(path=path)
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/lakehouse/_lakehouse.py:41, in _optimize_table(path)
38 if _pure_python_notebook():
39 from deltalake import DeltaTable
---> 41 DeltaTable(path).optimize.compact()
42 else:
43 from delta import DeltaTable
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/table.py:1899, in TableOptimizer.compact(self, partition_filters, target_size, max_concurrent_tasks, min_commit_interval, writer_properties, custom_metadata)
1896 if isinstance(min_commit_interval, timedelta):
1897 min_commit_interval = int(min_commit_interval.total_seconds())
-> 1899 metrics = self.table._table.compact_optimize(
1900 partition_filters,
1901 target_size,
1902 max_concurrent_tasks,
1903 min_commit_interval,
1904 writer_properties._to_dict() if writer_properties else None,
1905 custom_metadata,
1906 )
1907 self.table.update_incremental()
1908 return json.loads(metrics)
CommitFailedError: Unsupported reader features required: [DeletionVectors]
Expected behavior OPTIMIZE on table completes successfully instead of erroring out.
Desktop (please complete the following information):
- OS: Windows 11
- Browser: Edge
- Version: 136.0.3240.76 (Official build) (64-bit)
delta-rs (the library used for vacuuming when using a pure python notebook) does not support deletion vectors. In that case, use a pyspark notebook.