docker-hadoop-spark-workbench icon indicating copy to clipboard operation
docker-hadoop-spark-workbench copied to clipboard

How to read data into Spark from HDFS?

Open aleksandarskrbic opened this issue 5 years ago • 1 comments

I copied files from local filesystem to namenode container and then copy it to hdfs on "/user/root/data" path. Now, i have a problem to load data from hdfs into local spark application spark.read.format("json").load("hdfs://127.0.1.1:50070/user/root/data/file_name.json"). Problem is url to data i tried hdfs://127.0.1.1:50070, hdfs://localhost:50070, hdfs://namenode:50070, hdfs://namenode:8020 and none of this is valid. Is someone having similar problem ?

aleksandarskrbic avatar Jul 29 '19 18:07 aleksandarskrbic

Just randomly found this thread. Try hdfs://namenode:9000/myfolder/file.csv.

Accessing HDFS is not done via the same port as the web administration.

jakobhviid avatar Nov 15 '19 14:11 jakobhviid