apache-spark-internals icon indicating copy to clipboard operation
apache-spark-internals copied to clipboard

The Internals of Apache Spark

Results 7 apache-spark-internals issues
Sort by recently updated
recently updated
newest added

## Want help refine the leaving `FIXME` in `addFile` function please help review, if you think there are some inappropriate interpretation , I will refine. ### Changes: Firstly, `addFile` validate...

Hey, is it possible to build an HTML or PDF (preferred) version of this book for offline reading? What would be the process to do that? Thanks

It seems that some styling is broken https://books.japila.pl/apache-spark-internals/overview/ ![image](https://user-images.githubusercontent.com/247218/106383107-38a4db00-63c4-11eb-9b5c-61f8e4552e72.png) https://books.japila.pl/apache-spark-internals/rdd/ ![image](https://user-images.githubusercontent.com/247218/106383115-45c1ca00-63c4-11eb-8cdc-e295bd54ecaf.png) ![image](https://user-images.githubusercontent.com/247218/106383119-57a36d00-63c4-11eb-9875-e71d86d93934.png)

Hi, thanks for writing the notebooks. In the [introduction](https://jaceklaskowski.gitbooks.io/mastering-apache-spark/spark-rdd-partitions.html) to RDD partitioning you mention that the `filter` operation does not preserve partitioning. But I'm looking at the [source code](https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L387) where...

I currently wonder what is a good configuration of spark in case a lot of memory is available (on a single node) http://stackoverflow.com/questions/43262870/spark-with-high-memory-use-multiple-executors-per-node Maybe you could put some clarifying hints...

Could you explain the usage of https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-MemoryManager.html off heap memory better. For me http://stackoverflow.com/questions/43330902/spark-off-heap-memory-config-and-tungsten it is unclear if off heap is enabled automatically now (2.1.0) as tungsten is enabled by...

Hello, So reading the chapter: https://jaceklaskowski.gitbooks.io/mastering-apache-spark/content/spark-logging.html there is Log4J2 mentioned and linked (docs, etc.) However, I have found that Apache Spark (to my surprise) uses old Log4J1. Here is the...