datafusion-ballista icon indicating copy to clipboard operation
datafusion-ballista copied to clipboard

Run benchmarks using python library

Open rahull-p opened this issue 3 years ago • 11 comments

Which issue does this PR close?

Closes https://github.com/apache/arrow-ballista/issues/363

Rationale for this change

Ensures testing of python bindings as part of the integration test.

What changes are included in this PR?

  • Execution of TPCH benchmarks using the python client
  • Build Ballista python library as part of integration tests to enable the above benchmarks

Open points for discussion

  • Modified the builder base image from buster to bullseye given buster has python 3.7 as default python and numpy 1.22 has dropped support for 3.7. Bullseye provides 3.9.2 by default
  • The wheel generated has the version info and some suffix included (i.e ballista-0.8.0-cp37-abi3-manylinux_2_31_x86_64.whl) in the name. Currently using regex to transfer these to docker.

rahull-p avatar Oct 22 '22 16:10 rahull-p

This is fantastic :heart:. Thank you @rahulpenti. I tried this locally and ran into an error:

Step 11/17 : COPY python/target/wheels/ballista-*-manylinux*.whl /root/
COPY failed: no source files were specified
Traceback (most recent call last):
  File "/usr/bin/docker-compose", line 11, in <module>
    load_entry_point('docker-compose==1.25.0', 'console_scripts', 'docker-compose')()
  File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 72, in main
    command()
  File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 128, in perform_command
    handler(command, command_options)
  File "/usr/lib/python3/dist-packages/compose/cli/main.py", line 292, in build
    self.project.build(
  File "/usr/lib/python3/dist-packages/compose/project.py", line 397, in build
    build_service(service)
  File "/usr/lib/python3/dist-packages/compose/project.py", line 380, in build_service
    service.build(no_cache, pull, force_rm, memory, build_args, gzip, rm, silent, cli, progress)
  File "/usr/lib/python3/dist-packages/compose/service.py", line 1108, in build
    all_events = list(stream_output(build_output, output_stream))
  File "/usr/lib/python3/dist-packages/compose/progress_stream.py", line 25, in stream_output
    for event in utils.json_stream(output):
  File "/usr/lib/python3/dist-packages/compose/utils.py", line 61, in split_buffer
    for data in stream_as_text(stream):
  File "/usr/lib/python3/dist-packages/compose/utils.py", line 37, in stream_as_text
    for data in stream:
  File "/usr/lib/python3/dist-packages/compose/service.py", line 1816, in build
    with open(iidfile) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/tmpvr7s1glz'

andygrove avatar Oct 22 '22 18:10 andygrove

fyi @avantgardnerio

andygrove avatar Oct 22 '22 19:10 andygrove

@andygrove Can you share the contents of python/target/wheels/ which should ideally have the python package whl. If there are no files, can you rebuild ballista-builder Docker

rahull-p avatar Oct 23 '22 05:10 rahull-p

@rahulpenti I ran this again. There is no wheels directory under python/target, I ran a docker system prune -a before I did this to ensure that the builder docker image was rebuilt.

andygrove avatar Oct 23 '22 16:10 andygrove

Thats weird. Given you rebuilt the docker, can you manually run this command and share the logs. The maturin buildcommand in the entrypoint script is responsible for building the wheels

rahull-p avatar Oct 25 '22 12:10 rahull-p

Ok, here is the issue. The script silently failed.

+ cd /home/builder/workspace/python
+ python3 -m venv venv
Error: [Errno 2] No such file or directory: '/home/builder/workspace/python/venv/bin/python3'
+ source venv/bin/activate

andygrove avatar Oct 25 '22 14:10 andygrove

followed by:

+ python3 -m pip install -U pip
/usr/bin/python3: No module named pip
+ python3 -m pip install -r requirements-310.txt
/usr/bin/python3: No module named pip
+ maturin build
/home/builder/builder-entrypoint.sh: line 35: maturin: command not found

andygrove avatar Oct 25 '22 14:10 andygrove

We should add a set -e to the entrypoint script so that if fails like this:

+ python3 -m venv venv
Error: [Errno 2] No such file or directory: '/home/builder/workspace/python/venv/bin/python3'

andygrove avatar Oct 25 '22 14:10 andygrove

See https://github.com/apache/arrow-ballista/pull/444

andygrove avatar Oct 25 '22 14:10 andygrove

@andygrove I was able to do a clean build (docker system prune and git clone to ensure there are no old artifacts) on Intel Mac and M1. Which OS are you using. Though it shouldn't matter given its run inside docker

Also the above error message seems confusing given the python3 -m venv venv command was supposed to run using the system default python and not virtualenv based. And the next few commands should use virtualenv and not system default (Inferred from the mentioned python path)

rahull-p avatar Oct 26 '22 15:10 rahull-p

@rahulpenti I am also confused. I am on Ubuntu 20.04.4 LTS. I will investigate this more before the next release (so sometime in the next week or two).

andygrove avatar Nov 01 '22 23:11 andygrove