Josh Rosen comments

Results 99 comments of


                                            Josh Rosen

Save DF with specific Avro schema

After looking at this some more, I'm not entirely clear on what the semantics should be for a few important corner cases where there is significant mismatch between the Spark...

writing avro data in parquet format

@liancheng, do you know whether Spark's Parquet data source supports this? Would this even be possible in this library or is this request inherently out of scope w.r.t. this library's...

running two instances of spark-perf ?

Hi @msifalakis, My hunch is that this is a longstanding bug. It wouldn't surprise me if nobody has tried running two instances of `spark-perf` at the same time on the...

running two instances of spark-perf ?

`spark-perf` itself does not contain support for collection of compute resource utilization metrics (memory, CPU, I/O). `spark-ec2` clusters are launched with Ganglia installed, so it should be possible to pull...

Don't require spark-env.sh configuration file to be present

Also: don't require the slaves file: in 1.2+, the absence of a slaves file causes us to just run a cluster with a single local worker. We still need this...

Don't require spark-env.sh configuration file to be present

The `slaves` file issue was fixed by #29, but I think we still have to fix the `spark-env.sh` one.

S3-backed build cache

I don't think we should include build archiving as part of `merge_spark_pr.py` because a. it doesn't address the issue of bisecting over older builds, and b. it would slow down...

S3-backed build cache

I hadn't really fully thought this out as a general service for public consumption; the original scope / idea here was something narrowly-targeted at a specific problem that I encountered...

How to read the test result

The numbers in the `results` array are the raw timing measurements for the runs of the test, ordered in the order that the tests run (which is why the first...

python_mllib_perf: java.lang.OutOfMemoryError: Java heap space

Is this a bug in `spark-perf`? A problem with Spark 1.5.1? I'm trying to figure out if this issue is actionable.