docker-hadoop
docker-hadoop copied to clipboard
How to upload local file to HDFS?
I'm new with both Hadoop and docker I want to try the wordcount program on my own files. So I should put my file into /input/ . I followed the Makefile: docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -mkdir -p /input/ docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -copyFromLocal -f /opt/hadoop-3.2.1/README.txt /input/ then docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -cat /input/* It shows me a README.txt But when I use a file of my local filesystem to replace the '/opt/hadoop-3.2.1/README.txt' in 'docker run --network ${DOCKER_NETWORK} --env-file ${ENV_FILE} bde2020/hadoop-base:$(current_branch) hdfs dfs -copyFromLocal -f /opt/hadoop-3.2.1/README.txt /input/', it says copyFromLocal: No Such File or Directory. So I wonderd where the file '/opt/hadoop-3.2.1/README.txt' is. And I find that I do not have a directory 'hadoop-3.2.1' in my /opt of my local filesystem. How should I do to upload my own file into /input/? Where is '/opt/hadoop-3.2.1/'?
even 'docker run --network docker-hadoop_default --env-file hadoop.env bde2020/hadoop-base:latest hdfs dfs -cat /opt/hadoop-3.2.1/README.txt' will say cat: `/opt/hadoop-3.2.1/README.txt': No such file or directory
+1 on this.
I found the answer. You'll need two terminals open.
In the first terminal:
# Get a bash terminal on the namenode
docker exec -it namenode bash
# Make a directory for storing input files
mkdir input
# Make another directory for storing jar files
mkdir jars
In the second terminal:
# Add the file into the namenode's file system
# Find the container ID of your namenode container using docker container ls.
# It should be something like cb0c13085cd3.
docker cp relative/path/to/local/file/you/want/to/copy/into/hadoop.txt <NAMENODE-CONTAINER-ID>:/input/
# Add the wordcount jar into the namenode's file system
# This repository's submit folder already contains a compiled WordCount.jar file
docker cp submit/WordCount.jar <NAMENODE-CONTAINER-ID>:/jars/
In the first terminal:
hadoop fs -mkdir -p input
hdfs dfs -put ./input/* input
# Now run the executable
hadoop jar jars/WordCount.jar org.apache.hadoop.examples.WordCount input output
# View the output
hdfs dfs -ls output/
hdfs dfs -cat output/part-r-00000
You should see the output from the WordCount map/reduce task.
I got a lot of help from this tutorial.
Can I execute commands in any directory? I need to enter that container