spark-ec2 icon indicating copy to clipboard operation
spark-ec2 copied to clipboard

--spark-ec2-compressed option added.

Open ar-ms opened this issue 8 years ago • 7 comments

Description

--spark-ec2-compressed option enables to precise a compressed version of spark-ec2. This option is an alternative to cloning spark-ec2 from GitHub.

Accepted compression format

.tar, .tar.gz, .tar.bz2, .tar.xz

ar-ms avatar Jul 21 '16 08:07 ar-ms

Could you explain the motivation for this change ?

shivaram avatar Jul 21 '16 16:07 shivaram

I worked for a company that has a private GitLab, it was useful to me to have this feature because I cannot access the GitLab from the outside.

It could be nice to have an alternative to GitHub, if there is any problem with GitHub or the repository, you could continue to deploy cluster without wasting time.

ar-ms avatar Jul 25 '16 11:07 ar-ms

Hmm - but the git clone here is happening on the master machine -- Is the assumption that the master machine cannot access artifacts from the public internet ? In that case a lot of other things like installing Spark or HDFS will also fail ?

shivaram avatar Jul 28 '16 03:07 shivaram

It doesn't assume the situation where the master has no access to the Internet.

But those cases:

  • GitHub service outage (https://status.github.com/messages)
  • The spark-ec2 repository get corrupted or deleted...

ar-ms avatar Jul 28 '16 10:07 ar-ms

In that case can we simplify this and just take a URL to a tgz that can be used to do wget on the master ? It will simplify the code more and even github has urls of the form https://github.com/amplab/spark-ec2/archive/branch-1.6.zip

shivaram avatar Jul 28 '16 17:07 shivaram

Super idea :+1: ! So we can remove git clone and rsync, and replace them by a simple wget ?

ar-ms avatar Jul 29 '16 12:07 ar-ms

To be more conservative I'd make the zip file path a command line option and if the option is present, we can use wget. If not it'll still use the existing code path

shivaram avatar Jul 29 '16 16:07 shivaram