buck2 icon indicating copy to clipboard operation
buck2 copied to clipboard

Providing a hermetic python toolchain

Open arlyon opened this issue 3 years ago • 11 comments

For serious use-cases we would benefit from a system-independent python toolchain that can source an interpreter. At the basic level this involves downloading a copy of CPython for the current platform that can run code. An issue with this is that basic cpython depends on dynamic libraries such as libssl and libsqlite, so we need to either provide a mechanism for building those consistently (c / cpp compiler) or use an interpreter with static linking such as https://python-build-standalone.readthedocs.io

Potential learnings from bazel

  • dependencies are managed in a repo rule, meaning they are not using the same toolchain that bazel uses, but use the local toolchain
  • cross compilation is tough (broken)

arlyon avatar Oct 05 '22 12:10 arlyon

Here's what Pants is trying to do https://github.com/pantsbuild/pants/issues/7369

danmx avatar Dec 04 '22 07:12 danmx

You might want to look into https://github.com/indygreg/PyOxidizer and https://gregoryszorc.com/blog/2022/05/10/announcing-the-pyoxy-python-runner/ and https://github.com/indygreg/python-build-standalone/

LegNeato avatar Dec 08 '22 14:12 LegNeato

Ah, I see it is mentioned in the pants issue

LegNeato avatar Dec 08 '22 14:12 LegNeato

Hi all, we are releasing a hermetic toolchain based on indygreg/python-build-standalone this week. (also hi @LegNeato, I submitted a few patches for juniper a couple of years back :) )

arlyon avatar Dec 20 '22 10:12 arlyon

I'm currently looking for a way to bundle a static python build with a rust program. The closest I've found is https://github.com/python-cmake-buildsystem/python-cmake-buildsystem (although that goes only up to 3.6) which replaces the configure logic of cpython with cmake scripts, the same would presumably be needed for buck2 as well so that one could have a rust/cpp program build by buck2 and then statically link against cpython. I've found that this conceptually easy task is really not that well supported yet anywhere (see https://pyo3.rs/v0.14.0/building_and_distribution#statically-embedding-the-python-interpreter as well).

This is helpful if one needs/ wants python, but does not rely on much of its ecosystem/ packages and just wants a scripting language which is known by many developers for some parts of a system.

(python-build-standalone seems to be a wrapper around the configure scripts of the core cpython distribution + relies on docker which is a step I'd like to avoid)

benmkw avatar Apr 22 '23 09:04 benmkw

This is a little bit hacky, but I was asked to share how I fixed this problem:

(please ignore the fact this is python3.8 😓 )

BUCK

http_archive(                                                                                                            
   name = "python-standalone-archive",                                                                                  
   # TODO self host this                                                                                                
   urls = [ "https://github.com/indygreg/python-build-standalone/releases/download/20231002/cpython-3.8.18+20231002-x86_64-unknown-linux-gnu-pgo-full.tar.zst"],                                                                                 
   sha256 = "3209542fbcaf7c3ef5658b344ea357c4aabf5fe7cbf1b5dea4a0b78b64835fc0",                                         
   visibility = ["PUBLIC"],                                                                                             
)                                                                                                                        
                                                                                                                        
standalone_python(                                                                                                       
   name = "python-standalone",                                                                                          
   archive = ":python-standalone-archive",                                                                              
   visibility = ["PUBLIC"]                                                                                              
)                                                                                                                        
                                                                                                                        
prebuilt_cxx_library(                                                                                                    
   name = "python-headers",                                                                                             
   header_dirs = [ "@toolchains//python:python-standalone[includes]"],                                                  
   visibility = ["PUBLIC"],                                                                                             
)  

defs.bzl

                                                                                                
def _standalone_python_impl(ctx: AnalysisContext) -> list[Provider]:                                                     
 # generate a runnable python3 binary                                                                                 
 python = ctx.actions.declare_output("__python", dir = True)                                                          
 ctx.actions.copy_dir(python, ctx.attrs.archive)                                                                      
 interpreter = cmd_args(python, format = "{}/python/install/bin/python3").hidden(python)                              
                                                                                                                      
 # provide relavant headers for pybind                                                                                
 includes = ctx.actions.declare_output("include", dir = True)                                                         
 ctx.actions.copy_file(includes, python.project("python/install/include/python3.8"))                                  
                                                                                                                      
 return [                                                                                                             
     DefaultInfo(sub_targets = {                                                                                      
         "interpreter": [RunInfo(interpreter)],                                                                       
         "includes": [DefaultInfo(includes)],                                                                         
     })                                                                                                               
 ]                                                                                                                    
                                                                                                                      
standalone_python = rule(                                                                                                
 impl = _standalone_python_impl,                                                                                      
 attrs = {                                                                                                            
     "archive": attrs.source(),                                                                                       
 }                                                                                                                    
)                                                                                                                        


toolchain//BUCK

system_python_toolchain(                                                                                             
    name = "python",                                                                                                     
    interpreter = "toolchains//python:python-standalone[interpreter]",                                                   
    visibility = ["PUBLIC"],                                                                                             
)           

benbrittain avatar Nov 08 '23 18:11 benbrittain

Hey @benbrittain, quick question about that snippet. As far as I can tell the interpreter attribute on system_python_toolchain expects a string which represents the name of the python binary, e.g. python or python3. How did you get this to work providing a RunInfo reference to the interpreter attribute? Or is this more like pseudocode?

As it stands if I run this code I get output like this:

$ buckle build //:thing-that-uses-python
Local command returned non-zero exit code <no exit code>
Reproduce locally: `env -- 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' '//tool ...<omitted>... buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development (run `buck2 log what-failed` to get the full command)`
stdout:
stderr:
Spawning executable `//toolchains/python:python-standalone[interpreter]` failed: Failed to spawn a process
$ buckle log what-failed
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:thing-that-uses-python
build	root//:thing-that-uses-python (prelude//platforms:default#213ed1b7ab869379) (npm)	local	env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=ad2238c7-8161-47d5-b3fb-598a98a18e23' '//toolchains/python:python-standalone[interpreter]' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development

Note how it's just sticking the literal string //toolchains/python:python-standalone[interpreter] in the command.

jazzdan avatar Mar 01 '24 21:03 jazzdan

I believe that you'll want to use something like "$(exe //toolchains/python:python-standalone[interpreter])" instead, which gets replaced by the location of the artifact (the interpreter in that case). exe also ensures that it has a RunInfo, and location is also available when that's not required.

cbarrete avatar Mar 01 '24 23:03 cbarrete

@cbarrete yeah I tried that too but it doesn't look like that attribute supports the macros

Spawning executable `$(exe @prelude-replay//python:python-standalone[interpreter])` failed: Failed to spawn a process
Showing commands from: /Users/dan/Library/Caches/buckle/a1226c67e221a84c1562008739dcb322710e86e2/buck2 build //:backend-npm-install
build	root//:backend-npm-install (prelude//platforms:default#213ed1b7ab869379) (npm)	local	env -- 'TMPDIR=/Users/dan/devel/backend2/buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK_SCRATCH_PATH=buck-out/v2/tmp/root/213ed1b7ab869379/__backend-npm-install__/npm' 'BUCK2_DAEMON_UUID=89552eba-0149-445a-b0ee-f7d6ee0a0df3' 'BUCK_BUILD_ID=345f1930-17d7-4b88-b3da-be18602adcf4' '$(exe @prelude-replay//python:python-standalone[interpreter])' buck-out/v2/gen/prelude-replay/213ed1b7ab869379/npm/__npm_install.py__/npm_install.py --npm buck-out/v2/gen/toolchains/213ed1b7ab869379/__node-18.16.1__/node-18.16.1/bin/npm --package_json ./package.json --package_lock ./package-lock.json --output buck-out/v2/gen/root/213ed1b7ab869379/__backend-npm-install__/node_modules --environment development

jazzdan avatar Mar 01 '24 23:03 jazzdan

I ironed out some issues and got a working hermetic Python toolchain (at least on mac x86, mac arm64 and linux x86, Windows is untested). I posted it here https://github.com/jazzdan/buck2-python-toolchain-problem

jazzdan avatar Mar 12 '24 23:03 jazzdan