VeloxDFS
VeloxDFS copied to clipboard
DHT-based Distributed File System for MapReduce Jobs
VeloxDFS {#mainpage}
VeloxDFS is a decentralized distributed file system based on ChordDHT and HDFS.
This distributed file system serves as the foundation and essential component of the Velox Big Data Framework (VBDF), however, it can be used independently of the other components such as VeloxMR (MapReduce Framework).
Key features of current VeloxDFS implementation includes:
- No central directory service such as in HDFS NameNode.
- Logical block representation where the size adapts to the current workload.
- HADOOP API to be used instead of HDFS.
VELOX ECOSYSTEM
Related projects
| Project | Description | URL |
|---|---|---|
| VeloxMR | experimental MapReduce engine based on VeloxDFS | https://github.com/DICL/VeloxMR |
| eclipsed | deployment/debugging helper script writen in RUBY | https://github.com/DICL/eclipsed |
| velox-hadoop | VeloxDFS JAVA library for Hadoop | https://github.com/DICL/velox-hadoop |
| velox-deploy-ansible | ansible playbook to automatize velox/hadoop installation | https://github.com/DICL/velox-deploy-ansible |
| hadoop-etc | Hadoop configuration files to use VeloxDFS | https://github.com/vicentebolea/hadoop-etc |
| velox-report | Project to benchmark and log VeloxDFS performance | https://github.com/vicentebolea/velox-report |
USAGE
To deploy the system please refer to velox command and its following options:
$ veloxd up|down|restart|status
The command line API has the following options:
$ veloxdfs put|get|cat|ls|rm|format|show
The C++ API can be found at vdfs.hh and DFS.hh files, as for the JAVA API, it can be found at src/java directory.
COMPILING & INSTALLING
Further information can be found it in: Installation
Compiling requirements
- C++14 support, this is: GCC >= 4.9, Clang >= 3.4, or ICC >= 16.0.
- Boost library >= 1.56.
- Sqlite3 library.
- GNU Autotools (Autoconf, Automake, Libtool).
- Unittest++ [optional].
For single user installation for developers
$ mkdir -p local_eclipse/{tmp,sandbox} # Create a sandbox directories
$ cd local_eclipse # enter in the directory
$ git clone [email protected]:DICL/VeloxDFS.git # Clone the project from github
$ cd VeloxDFS
$ sh autogen.sh # Generate configure script
$ cd ../tmp # Go to building folder
$ sh ../VeloxDFS/configure --prefix=`pwd`/../sandbox # Check requirements and generate the Makefile
# If you get a boost error go the FAQ section of the README
### This last command will be needed whenever you want to recompile the source
$ make [-j#] install # Compile & install add -j flag to speed up
Now edit in your ~/.bashrc or ~/.profile:
export PATH="/home/*..PATH/To/eclipse/..*/sandbox/bin":$PATH
export LIBRARY_PATH="/home/*..PATH/To/eclipse/..*/sandbox/lib"
export C_INCLUDE_PATH="/home/*..PATH/To/eclipse/..*/sandbox/include"
Default settings for VELOXDFS
"log" : {
"type" : "LOG_LOCAL6"
"name" : "ECLIPSE"
"mask" : "DEBUG"
},
"cache" : {
"numbin" : 100,
"size" : 200000,
"concurrency" : 1
},
"filesystem" : {
"block" : 137438953,
"buffer" : 512,
"replica" : 1
}
Further information can be found it in: Conf reference
FAQ
-
Question :
configurestops with errors related to boost library. -
Answer : It probably means that you do not have boost library installed in the default location, in such case you should specify the boost library location.
sh ../VeloxDFS/configure --prefix ~/sandbox --with-boost=/usr/local --with-boost-libdir=/usr/local/libIn this example we assume that the boost headers are in
/usr/local/includewhile the library files are inside/usr/local/lib.
AUTHORS
- Vicente Adolfo Bolea Sanchez
- MooHyeon Nam
- WonBae Kim
- KiBeom Jin
- Deukyeon Hwang
- Prof. Nam Beomseok
- INSTITUTION: DICL laboratory at UNIST