vagrant-hadoop-spark-hive
vagrant-hadoop-spark-hive copied to clipboard
Vagrant project to spin up a single virtual machine running current versions of Hadoop, Hive and Spark
Vagrant for Hadoop, Spark and Hive
Introduction
Vagrant project to spin up a single virtual machine running:
- Hadoop 2.7.2
- Hive 1.2.1
- Spark 1.6.0
The virtual machine will be running the following services:
- HDFS NameNode + NameNode
- YARN ResourceManager + JobHistoryServer + ProxyServer
- Hive metastore and server2
- Spark history server
Getting Started
- Download and install VirtualBox
- Download and install Vagrant.
- Run
vagrant box add centos65 https://github.com/2creatives/vagrant-centos/releases/download/v6.5.1/centos65-x86_64-20131205.box
- Go to releases and download and extract the latest source of this project.
- In your terminal change your directory into the project directory (i.e.
cd vagrant-hadoop-spark-hive-<version>
). - Run
vagrant up
to create the VM. - Execute
vagrant ssh
to login to the VM.
Web user interfaces
Here are some useful links to navigate to various UI's:
- YARN resource manager: (http://10.211.55.101:8088)
- Job history: (http://10.211.55.101:19888/jobhistory/)
- HDFS: (http://10.211.55.101:50070/dfshealth.html)
- Spark history server: (http://10.211.55.101:18080)
- Spark context UI (if a Spark context is running): (http://10.211.55.101:4040)
Validating your virtual machine setup
To test out the virtual machine setup, and for examples of how to run MapReduce, Hive and Spark, head on over to VALIDATING.md.
Starting services in the event of a system restart
Currently if you restart your VM then the Hadoop/Spark/Hive services won't be up (this is something I'll address soon). In the interim you can run the following commands to bring them up:
$ vagrant ssh
$ sudo -s
$ /vagrant/scripts/start-hadoop.sh
$ nohup hive --service metastore < /dev/null > /usr/local/hive/logs/hive_metastore_`date +"%Y%m%d%H%M%S"`.log 2>&1 </dev/null &
$ nohup hive --service hiveserver2 < /dev/null > /usr/local/hive/logs/hive_server2_`date +"%Y%m%d%H%M%S"`.log 2>&1 </dev/null &
$ /usr/local/spark/sbin/start-history-server.sh
More advanced setup
If you'd like to learn more about working and optimizing Vagrant then take a look at ADVANCED.md.
For developers
The file DEVELOP.md contains some tips for developers.
Credits
This project is based on the great work carried out at (https://github.com/vangj/vagrant-hadoop-2.4.1-spark-1.0.1).