datafusion-ballista
datafusion-ballista copied to clipboard
Document how to run TPC-H benchmarks in Kubernetes
Is your feature request related to a problem or challenge? Please describe what you are trying to do. I had to spend time figuring out how to deploy the benchmarks to Kubernetes, so I plan on documenting this.
Describe the solution you'd like
- Dockerfile for packaging up benchmarks
- Example YAML for running as a pod.
Here is the YAML I have been using:
apiVersion: v1
kind: Pod
metadata:
name: tpch
namespace: default
spec:
containers:
- image: andygrove/ballista-arm64
command: [ "/tpch",
"benchmark",
"--query=1",
"--path=/mnt/tpch/parquet-sf100-partitioned/",
"--format=parquet",
"--concurrency=24",
"--iterations=1",
"--debug",
"--host=ballista-scheduler",
"--port=50050"]
imagePullPolicy: Always
name: tpch
volumeMounts:
- mountPath: /mnt/tpch/parquet-sf100-partitioned/
name: data
restartPolicy: Never
volumes:
- name: data
persistentVolumeClaim:
claimName: data-pv-claim
Describe alternatives you've considered None
Additional context None