rules_py
rules_py copied to clipboard
[FR]: More granular `py_image_layer` behavior
Today py_image_layer takes the .runfiles tree from a py_binary including the new py_venv_* rules and by default attempts to break that one input filegroup into three layers.
- The "interpreter", assuming that the user is leaning on hermetic interpreters from Astral.
- The
site-packagestree(s), which allows for layer reuse between images which share the same total set of 3rdparty input files. - User files / everything else.
While this makes sense as a basic splitting strategy, it leaves much to be desired.
- The interpreter has its own
site-packagesfiles which are part of the interpreter installation, which today gets pulled into thesite-packageslayer and should instead be part of the interpreter. - The single
site-packageslayer is far too coarse grained. Especially in the presence of heavy Python dependencies such as Torch, Numpy, Simplejpeg or Tensorflow, a single bigsite-packageslayer fails to capitalize on the potential for layer/content reuse. Editing a single dependency invalidates the entire 3rdparty layer.
Making this better requires some amount of rethinking how py_image_layer is implemented, but I think there's some room here.
As noted by @thesayyn there are also some fairly tricky design constraints around trying to not just enable layer reuse for transit size but enable action reuse.