docker-hadoop-spark-workbench
docker-hadoop-spark-workbench copied to clipboard
How to read data into Spark from HDFS?
I copied files from local filesystem to namenode container and then copy it to hdfs on "/user/root/data" path. Now, i have a problem to load data from hdfs into local spark application
spark.read.format("json").load("hdfs://127.0.1.1:50070/user/root/data/file_name.json")
.
Problem is url to data i tried hdfs://127.0.1.1:50070, hdfs://localhost:50070, hdfs://namenode:50070, hdfs://namenode:8020 and none of this is valid. Is someone having similar problem ?
Just randomly found this thread. Try hdfs://namenode:9000/myfolder/file.csv.
Accessing HDFS is not done via the same port as the web administration.