ucx icon indicating copy to clipboard operation
ucx copied to clipboard

[FEATURE]: Migrate direct filesystem access to UC tables

Open nfx opened this issue 7 months ago • 3 comments

We have instances of spark.read.format("delta").load("s3a://prefix/...") in the code, though we want to migrate that into spark.table("catalog.schema.table") to follow UC practices. Build on top of "tables in mounts". See:

  • https://github.com/databrickslabs/ucx/issues/1225

Do we migrate to UC Volumes?

yes

Do we resolve mounts?

yes

Do we resolve dbutils.widgets.get()?

if possible

where to store mappings? add a prefix in the table mapping?

TBD

what scans all jobs?

  1. assessment workflow
  2. migration-progress (new) workflow on a daily schedule

See:

  • https://github.com/databrickslabs/ucx/issues/2074

what determines all direct filesystem accesses?

  • Extend FromDbfsFolder, DirectFilesystemAccessMatcher, and FromTable to return file access.
  • Add new matchers for open('/dbfs/...') literals.
  • Modify WorkflowLinter to persist this information in a new table.

nfx avatar Jul 02 '24 19:07 nfx