skypilot
skypilot copied to clipboard
[Experimental] Sky Service
Description
Adds Sky broker support for Sky services and provisioning/termination of Sky services, specifically for data analytics (EMR for AWS & Dataproc for GCP).
Experiments
Runs TPCDS SF 1/100 on a 3 node Spark cluster launched by Sky
Task YAML
service:
type: data-analytics
dependencies:
spark: 3.1.2
resources:
cloud: aws
num_nodes: 3
setup: |
cd ~/
source ~/.bashrc
(sudo yum -y install gcc make flex bison byacc git htop tmux) || (sudo apt-get -y install gcc make flex bison byacc git htop tmux)
git clone https://github.com/michaelzhiluo/spark-sql-perf.git
git clone https://github.com/databricks/tpcds-kit.git
cd tpcds-kit/tools
make OS=LINUX
mkdir -p ~/spark-warehouse/tpcds.db/
run: |
cd ~/
if [ $SKY_NODE_RANK == "0" ]
then
echo :quit | spark-shell --jars ~/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar -i ~/spark-sql-perf/scripts/gendata.scala
echo :quit | spark-shell --jars ~/spark-sql-perf/target/scala-2.12/spark-sql-perf_2.12-0.5.1-SNAPSHOT.jar -i ~/spark-sql-perf/scripts/run.scala
else
echo "Worker node"
fi