Run benchmarks using python library
Which issue does this PR close?
Closes https://github.com/apache/arrow-ballista/issues/363
Rationale for this change
Ensures testing of python bindings as part of the integration test.
What changes are included in this PR?
- Execution of TPCH benchmarks using the python client
- Build Ballista python library as part of integration tests to enable the above benchmarks
Open points for discussion
- Modified the builder base image from buster to bullseye given buster has python 3.7 as default python and numpy 1.22 has dropped support for 3.7. Bullseye provides 3.9.2 by default
- The wheel generated has the version info and some suffix included (i.e
ballista-0.8.0-cp37-abi3-manylinux_2_31_x86_64.whl) in the name. Currently using regex to transfer these to docker.
This is fantastic :heart:. Thank you @rahulpenti. I tried this locally and ran into an error:
Step 11/17 : COPY python/target/wheels/ballista-*-manylinux*.whl /root/
COPY failed: no source files were specified
Traceback (most recent call last):
File "/usr/bin/docker-compose", line 11, in <module>
load_entry_point('docker-compose==1.25.0', 'console_scripts', 'docker-compose')()
File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 72, in main
command()
File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 128, in perform_command
handler(command, command_options)
File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 292, in build
self.project.build(
File "/usr/lib/python3/dist-packages/compose/project.py", line 397, in build
build_service(service)
File "/usr/lib/python3/dist-packages/compose/project.py", line 380, in build_service
service.build(no_cache, pull, force_rm, memory, build_args, gzip, rm, silent, cli, progress)
File "/usr/lib/python3/dist-packages/compose/service.py", line 1108, in build
all_events = list(stream_output(build_output, output_stream))
File "/usr/lib/python3/dist-packages/compose/progress_stream.py", line 25, in stream_output
for event in utils.json_stream(output):
File "/usr/lib/python3/dist-packages/compose/utils.py", line 61, in split_buffer
for data in stream_as_text(stream):
File "/usr/lib/python3/dist-packages/compose/utils.py", line 37, in stream_as_text
for data in stream:
File "/usr/lib/python3/dist-packages/compose/service.py", line 1816, in build
with open(iidfile) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpvr7s1glz'
fyi @avantgardnerio
@andygrove Can you share the contents of python/target/wheels/ which should ideally have the python package whl. If there are no files, can you rebuild ballista-builder Docker
@rahulpenti I ran this again. There is no wheels directory under python/target, I ran a docker system prune -a before I did this to ensure that the builder docker image was rebuilt.
Thats weird. Given you rebuilt the docker, can you manually run this command and share the logs. The maturin buildcommand in the entrypoint script is responsible for building the wheels
Ok, here is the issue. The script silently failed.
+ cd /home/builder/workspace/python
+ python3 -m venv venv
Error: [Errno 2] No such file or directory: '/home/builder/workspace/python/venv/bin/python3'
+ source venv/bin/activate
followed by:
+ python3 -m pip install -U pip
/usr/bin/python3: No module named pip
+ python3 -m pip install -r requirements-310.txt
/usr/bin/python3: No module named pip
+ maturin build
/home/builder/builder-entrypoint.sh: line 35: maturin: command not found
We should add a set -e to the entrypoint script so that if fails like this:
+ python3 -m venv venv
Error: [Errno 2] No such file or directory: '/home/builder/workspace/python/venv/bin/python3'
See https://github.com/apache/arrow-ballista/pull/444
@andygrove I was able to do a clean build (docker system prune and git clone to ensure there are no old artifacts) on Intel Mac and M1. Which OS are you using. Though it shouldn't matter given its run inside docker
Also the above error message seems confusing given the python3 -m venv venv command was supposed to run using the system default python and not virtualenv based. And the next few commands should use virtualenv and not system default (Inferred from the mentioned python path)
@rahulpenti I am also confused. I am on Ubuntu 20.04.4 LTS. I will investigate this more before the next release (so sometime in the next week or two).