semantic-link-labs icon indicating copy to clipboard operation
semantic-link-labs copied to clipboard

Bug: optimize_lakehouse_tables function fails on tables with deletion vectors in pure Python notebooks

Open crazy-treyn opened this issue 9 months ago • 1 comments

Describe the bug Running the lakehouse.optimize_lakehouse_tables function on tables with deletion vectors enabled return an error in Fabric pure Python notebook.

To Reproduce Steps to reproduce the behavior:

  1. Create Delta Table in Fabric Lakehouse with deletion vectors enabled.
  2. Create pure python (not pyspark) notebook
  3. run %pip install semantic-link-labs --upgrade
  4. restart python kernel
  5. import sempy_labs and run the lakehouse.optimize_lakehouse_tables function on the table with deletion vectors enabled.
  6. Python notebook will return the following error:
File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy/_utils/_log.py:371, in mds_log.<locals>.get_wrapper.<locals>.log_decorator_wrapper(*args, **kwargs)
    368 start_time = time.perf_counter()
    370 try:
--> 371     result = func(*args, **kwargs)
    373     # The invocation for get_message_dict moves after the function
    374     # so it can access the state after the method call
    375     message.update(extractor.get_completion_message_dict(result, arg_dict))

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/lakehouse/_lakehouse.py:100, in optimize_lakehouse_tables(tables, lakehouse, workspace)
     98 path = r["Location"]
     99 bar.set_description(f"Optimizing the '{table_name}' table...")
--> 100 _optimize_table(path=path)

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/sempy_labs/lakehouse/_lakehouse.py:41, in _optimize_table(path)
     38 if _pure_python_notebook():
     39     from deltalake import DeltaTable
---> 41     DeltaTable(path).optimize.compact()
     42 else:
     43     from delta import DeltaTable

File ~/jupyter-env/python3.11/lib/python3.11/site-packages/deltalake/table.py:1899, in TableOptimizer.compact(self, partition_filters, target_size, max_concurrent_tasks, min_commit_interval, writer_properties, custom_metadata)
   1896 if isinstance(min_commit_interval, timedelta):
   1897     min_commit_interval = int(min_commit_interval.total_seconds())
-> 1899 metrics = self.table._table.compact_optimize(
   1900     partition_filters,
   1901     target_size,
   1902     max_concurrent_tasks,
   1903     min_commit_interval,
   1904     writer_properties._to_dict() if writer_properties else None,
   1905     custom_metadata,
   1906 )
   1907 self.table.update_incremental()
   1908 return json.loads(metrics)

CommitFailedError: Unsupported reader features required: [DeletionVectors]

Expected behavior OPTIMIZE on table completes successfully instead of erroring out.

Desktop (please complete the following information):

  • OS: Windows 11
  • Browser: Edge
  • Version: 136.0.3240.76 (Official build) (64-bit)

crazy-treyn avatar May 21 '25 18:05 crazy-treyn

delta-rs (the library used for vacuuming when using a pure python notebook) does not support deletion vectors. In that case, use a pyspark notebook.

m-kovalsky avatar May 22 '25 06:05 m-kovalsky