docker-hadoop icon indicating copy to clipboard operation
docker-hadoop copied to clipboard

"New to Hadoop" use case

Open dav-ell opened this issue 5 years ago • 5 comments

I'm completely new to Hadoop, and I found this repo because I had the thought, "Wow, installing Hadoop is hard, and all I want is HDFS. Surely there's got to be an easier way to do this. Maybe someone made a Docker container!"

Indeed, this repo does an amazing job of getting all the complicated details out of the way. But there's a number of questions that are left unanswered after getting this running. Thought it'd be useful to list them here:

  • Why is only the namenode port (9870) forwarded in docker-compose.yml? I opened up the other ports (8088, 8042, 9864, and 8188) in the other services and can access all the UIs now.
  • How do I connect other hosts to the one I spun up with docker-compose up? It'd be amazing to be able to do something like docker-compose up (and maybe another command) on another host and have them connect.
  • How do I get started with uploading data to HDFS? I tried using "Upload" in http://localhost:9870->`Utilities->Browse the file system, and it failed.

These questions will probably be answered just by working with Hadoop more, but I thought they could help you guys if you're looking to address the new crowd. Lots of university students, especially those doing data science/engineering, are starting to feel the need to get familiar with tools like this.

dav-ell avatar Nov 06 '19 13:11 dav-ell

How do I get started with uploading data to HDFS?

This question is answered by mounting your local dir in the datanode, like this:

volumes:
      - hadoop_datanode:/hadoop/dfs/data
      - /home/me:/home/me

Then using docker exec -it [datanote-id] hdfs dfs -put /home/me/file /hdfs/location/file.

dav-ell avatar Nov 07 '19 14:11 dav-ell

How do I get started with uploading data to HDFS? I tried using "Upload" in http://localhost:9870-> `Utilities->Browse the file system, and it failed.

i get into trouble as you are when i want use webhdfs to operate files. Upload in Utilities->Browse report fail that looks like webhdfs is not working.

heeeeeeeeeeeelp

2qif49lt avatar Nov 22 '19 09:11 2qif49lt

i kown, webhdfs restfull api return datanode's hostname, not its IP. browser cannot resolve it.

2qif49lt avatar Nov 22 '19 09:11 2qif49lt

i kown, webhdfs restfull api return datanode's hostname, not its IP. browser cannot resolve it.

Webhdfs restfull api will redirect request to datanode,but it use hostname.The networks of docker is separate with the host computer's,which means host computer can not connect with datanode in docker directly.

Thus,I add a forward proxy service using nginx into docker-compose file and set up proxy server in my browser.Then it works not very well.I can download file in HDFS use WebHDFS,but I have to change the hostname to IP address manually.

I noticed that you have the same problem.How can I get WebHDFS to return an IP address instead of datanode's hostname? @2qif49lt

FHamster avatar Mar 30 '20 08:03 FHamster

There is a great answer by @earthquakesan on how to access and cp files to the hadoop fs here btw: https://github.com/big-data-europe/docker-hadoop-spark-workbench/issues/28#issuecomment-315528621

otosky avatar Apr 18 '20 20:04 otosky