spark-ec2
spark-ec2 copied to clipboard
sparc
I am using spark-1.6.1-prebuilt-with-hadoop-2.6 on mac. I am using the spark-ec2 script to launch a cluster in Amazon VPC.
The setup.sh script [run first thing on master after launch] uses pssh and tries to install it via 'yum install -y pssh'. This step always fails on the master AMI that the script uses by default as it is not able to find it in the repo mirrors - hits 403, PYCURL ERROR 22.
Logging into the master and trying it manually does not work. I tried yum install -y python-pip and hit same issue. I tried editing the epel.repo as per amazon linux ami faq suggestion, but it didn't help. May be getting overridden.
For now, I have changed the script to not use pssh as a workaround. But would like to understand and fix the root cause.
I think this is because the Amazon VPC cluster might not have access to the internet.