docker-hadoop-spark-workbench icon indicating copy to clipboard operation
docker-hadoop-spark-workbench copied to clipboard

[EXPERIMENTAL] This repo includes deployment instructions for running HDFS/Spark inside docker containers. Also includes spark-notebook and HDFS FileBrowser.

Results 20 docker-hadoop-spark-workbench issues
Sort by recently updated
recently updated
newest added

Hi, when I try to write parquet file into HDFS, I have below issue: File /data_set/hello/crm_last_month/2534cb7a-fc07-401e-bdd3-2299e7e657ea.parquet could only be replicated to 0 nodes instead of minReplication (=1). There are 1...

I try do simple application with scala spark as follow: ``` val conf = new SparkConf().setMaster("spark://192.168.20.108:7077") .setAppName("BI-SERVICE") val numbersRdd = sc.parallelize((1 to 10000).toList) numbersRdd.saveAsTextFile("hdfs://192.168.20.108:8020/numbers-as-text02") ``` However spark job still running...

![image](https://user-images.githubusercontent.com/12899262/123443434-7a028500-d608-11eb-86d1-23cf6b3c33bf.png) ``` 2021-06-25 14:56:02.50 UTCspark-master-7bd8dfcc44-r8ks7spark-master21/06/25 14:56:02 INFO util.SignalUtils: Registered signal handler for TERM 2021-06-25 14:56:02.50 UTCspark-master-7bd8dfcc44-r8ks7spark-master21/06/25 14:56:02 INFO util.SignalUtils: Registered signal handler for HUP 2021-06-25 14:56:02.50 UTCspark-master-7bd8dfcc44-r8ks7spark-master21/06/25 14:56:02 INFO util.SignalUtils:...

I was able to start the hive server and metastore service using this project(bde2020/hive) in windows machine. And Spark shell which is running in Windows is able to connect this...

How to start beeline in the container hive-server

### I working with spark notebooks, regarding to [Scalable Spark/HDFS Workbench using Docker](https://www.big-data-europe.eu/scalable-sparkhdfs-workbench-using-docker/) `val textFile = sc.textFile("/user/root/vannbehandlingsanlegg.csv")` `textFile:` org.apache.spark.rdd.RDD[String] = /user/root/vannbehandlingsanlegg.csv MapPartitionsRDD[1] at textFile at `:67` ### It will show...

i have namenode(10.0.5.x) in machine-1 spark master(10.0.5.x) in machine-1 network-endpoint(10.0.5.3) in machine-2 spark worker(10.0.5.x) in machine-2 datanode(10.0.5.x) in machine-2 my code run in spark master(use pyspark) `text = sc.textFile("hdfs://namenode:9000/path/file")` `text.collect()`...

In makefile traefik version is not specified, and the syntax is incompatible with the current version (2)

I copied files from local filesystem to namenode container and then copy it to hdfs on "/user/root/data" path. Now, i have a problem to load data from hdfs into local...