spark-ec2
spark-ec2 copied to clipboard
Running Low on Storage when Building Specific Spark Version
Hi,
I am having trouble creating the spark cluster with a custom spark version. I am doing:
ec2/spark-ec2 --key-pair=<key-name> --identity-file=<key-file> --region=eu-west-1 --zone=eu-west-1a --vpc-id=<vpc-id> --subnet-id=<subnet-id> --copy-aws-credentials --hadoop-major-version=2 --instance-profile-name=<instance-profile-name> --slaves=1 -v 4f894dd6906311cb57add6757690069a18078783 launch cluster_test
-v is using a specific git commit with the given hash (e.g. Spark Version 1.5.1)
When the cluster nodes are started spark is cloned from git (into /root folder) and built. After a while, the script stops because of "no space left on device" warnings. When I login into master and check the space left:
>df
Dateisystem 1K‐Blöcke Benutzt Verfügbar Ben% Eingehängt auf
/dev/xvda1 8256952 6693968 1479128 82% /
tmpfs 3816808 0 3816808 0% /dev/shm
/dev/xvdb 433455904 1252616 410184984 1% /mnt
/dev/xvdf 433455904 203012 411234588 1% /mnt2
So there are 1.4 GB left on device, but when trying to download a big file, it fails again with the "no space left on device" message.
I realised that the inodes are the restricting factor here:
df -i
Dateisystem Inodes IUsed IFree IUse% Eingehängt auf
/dev/xvda1 524288 524288 0 100% /
tmpfs 954202 1 954201 1% /dev/shm
/dev/xvdb 27525120 12 27525108 1% /mnt
/dev/xvdf 27525120 11 27525109 1% /mnt2
Can someone help me increasing the root disk volume? It might be good to increase the standard volume size such that spark can be built.
Thanks @felixmaximilian for the report. The trouble is that increasing EBS volume size requires AMIs to be rebuilt for all the regions.
One workaround might be to build Spark on the ephemeral disk at /mnt
. Could you see if that works and if so we can make a code change for that ?
Building spark in /mnt/ is possible. Do you plan to copy the compiled spark back to the EBS volume? Then its necessary to make sure you don't copy the whole target folder etc. We should build the distribution (make-distribution.sh) in /mnt/ and than uncompress it back to the EBS volume. What do you think? We could try to jointly find a comfortable solution on Monday. Have a nice weekend.
Just for the record, I'm running into this issue as well.
@felixmaximilian - Have you made any progress on solving this? I can help you write a patch, if you are interested in writing one.
A colleague created an ami with much more (I guess ebs) space on the main partition (root). Haven't really tried it again but it should be solved though.
Fixing this problem within the code of ec2 spark wasn't very much successful on my side. Tried different things but ended with the problem that you cannot really do much on the externally mounted mnt2 mnt3 etc while starting the cluster because they are added and removed during the process. Didn't really get it why. (Idea was to build it on external storage and to copy it back to root then). We can give it another try with combined forces :)
But another question : is 8gb on the root partition really enough if just the installation files fit there!? What is about the hdfs in the ephemeral folder? As far as I can remember this is also existing in root which means we can hardly save anything to the hdfs, right? It might be worth to resize all the Amis to a bigger partition or have at least another partition from the very beginning to be able to do stuff there. Nicholas Chammas [email protected] schrieb am Do., 5. Nov. 2015 um 20:51:
Just for the record, I'm running into this issue as well.
@felixmaximilian https://github.com/felixmaximilian - Have you made any progress on solving this? I can help you write a patch, if you are interested in writing one.
— Reply to this email directly or view it on GitHub https://github.com/amplab/spark-ec2/issues/17#issuecomment-154171456.
Hmm, anything that requires updating all the spark-ec2 AMIs is a tough sell since that takes a lot of work and the process is not automated.
Yeah just to clear some things - AFAIK to increase root partition size needs an AMI rebuild. However I think we should be able to clone and build Spark on /mnt using make-distribution.sh
and then unzip to the root partition.
The HDFS thing is not really an issue -- the HDFS binaries are on /root but it uses /mnt on every machine for storage, so it can use all the ephemeral storage.
I think this issue can be resolved without having to do any work on the AMIs. See this comment.
+1