ucx icon indicating copy to clipboard operation
ucx copied to clipboard

[FEATURE]: Pre-process packages available via the DBR without installation

Open asnare opened this issue 1 year ago • 4 comments

Is there an existing issue for this?

  • [X] I have searched the existing issues

Problem statement

The various DBR runtimes include many packages[^1] that are always available and do not need to be installed or declared by notebooks (or jobs): they can simply be used. At present our dependency tracking isn't aware of these.

[^1]: As an example, here is the list of packages for DBR 14.3.

Proposed Solution

The packages included in the various DBR versions should be included in the list of known packages that we maintain.

Additional Context

The published lists for each DBR version are roughly correct; it turns out that the base OS images used also include some packages. I've scanned most of the currently supported DBR versions (9.1, 10.4, 11.3, 12.2, 13.3, 14.1, 14.2, 14.3, 15.1 & 15.2) and produced this list of installed pip packages and the various versions in use across these runtimes.

asnare avatar Jun 25 '24 08:06 asnare

here are all the packages since DBR 9.x - https://github.com/databrickslabs/sandbox/blob/main/runtime-packages/sample-output.txt

we don't really care about specific versions of those packages. at least for now.

nfx avatar Jul 02 '24 18:07 nfx

here are all the packages since DBR 9.x - https://github.com/databrickslabs/sandbox/blob/main/runtime-packages/sample-output.txt

Thanks for that, I wasn't aware of that tool and it looks quite useful.

The lists are roughly the same with a few differences here and there. Some notes:

  • I didn't enumerate the ML-runtimes.
  • The sandbox list is produced via pkg_resources.working_set, with a few things filtered out.
  • My version is based on pip list --format=json.
  • Both seem to miss a few things; the overlap is about 75%.

asnare avatar Jul 03 '24 08:07 asnare

I am oké with a "good enough" approach. Great to have full coverage of the pre-installed packages, but good enough to get 80%.

JCZuurmond avatar Jul 17 '24 12:07 JCZuurmond

Use make known to update the known.json

JCZuurmond avatar Aug 13 '24 13:08 JCZuurmond