datafusion-ballista icon indicating copy to clipboard operation
datafusion-ballista copied to clipboard

Document how to run TPC-H benchmarks in Kubernetes

Open andygrove opened this issue 4 years ago • 0 comments

Is your feature request related to a problem or challenge? Please describe what you are trying to do. I had to spend time figuring out how to deploy the benchmarks to Kubernetes, so I plan on documenting this.

Describe the solution you'd like

  • Dockerfile for packaging up benchmarks
  • Example YAML for running as a pod.

Here is the YAML I have been using:

apiVersion: v1
kind: Pod
metadata:
  name: tpch
  namespace: default
spec:
  containers:
    - image: andygrove/ballista-arm64
      command: [ "/tpch",
                 "benchmark",
                 "--query=1",
                 "--path=/mnt/tpch/parquet-sf100-partitioned/",
                 "--format=parquet",
                 "--concurrency=24",
                 "--iterations=1",
                 "--debug",
                 "--host=ballista-scheduler",
                 "--port=50050"]
      imagePullPolicy: Always
      name: tpch
      volumeMounts:
          - mountPath: /mnt/tpch/parquet-sf100-partitioned/
            name: data
  restartPolicy: Never
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: data-pv-claim

Describe alternatives you've considered None

Additional context None

andygrove avatar May 16 '21 14:05 andygrove