rules_py icon indicating copy to clipboard operation
rules_py copied to clipboard

[FR]: More granular `py_image_layer` behavior

Open arrdem opened this issue 10 months ago • 0 comments

Today py_image_layer takes the .runfiles tree from a py_binary including the new py_venv_* rules and by default attempts to break that one input filegroup into three layers.

  1. The "interpreter", assuming that the user is leaning on hermetic interpreters from Astral.
  2. The site-packages tree(s), which allows for layer reuse between images which share the same total set of 3rdparty input files.
  3. User files / everything else.

While this makes sense as a basic splitting strategy, it leaves much to be desired.

  1. The interpreter has its own site-packages files which are part of the interpreter installation, which today gets pulled into the site-packages layer and should instead be part of the interpreter.
  2. The single site-packages layer is far too coarse grained. Especially in the presence of heavy Python dependencies such as Torch, Numpy, Simplejpeg or Tensorflow, a single big site-packages layer fails to capitalize on the potential for layer/content reuse. Editing a single dependency invalidates the entire 3rdparty layer.

Making this better requires some amount of rethinking how py_image_layer is implemented, but I think there's some room here.

As noted by @thesayyn there are also some fairly tricky design constraints around trying to not just enable layer reuse for transit size but enable action reuse.

arrdem avatar May 01 '25 17:05 arrdem