Josh Rosen
Josh Rosen
After looking at this some more, I'm not entirely clear on what the semantics should be for a few important corner cases where there is significant mismatch between the Spark...
@liancheng, do you know whether Spark's Parquet data source supports this? Would this even be possible in this library or is this request inherently out of scope w.r.t. this library's...
Hi @msifalakis, My hunch is that this is a longstanding bug. It wouldn't surprise me if nobody has tried running two instances of `spark-perf` at the same time on the...
`spark-perf` itself does not contain support for collection of compute resource utilization metrics (memory, CPU, I/O). `spark-ec2` clusters are launched with Ganglia installed, so it should be possible to pull...
Also: don't require the slaves file: in 1.2+, the absence of a slaves file causes us to just run a cluster with a single local worker. We still need this...
The `slaves` file issue was fixed by #29, but I think we still have to fix the `spark-env.sh` one.
I don't think we should include build archiving as part of `merge_spark_pr.py` because a. it doesn't address the issue of bisecting over older builds, and b. it would slow down...
I hadn't really fully thought this out as a general service for public consumption; the original scope / idea here was something narrowly-targeted at a specific problem that I encountered...
The numbers in the `results` array are the raw timing measurements for the runs of the test, ordered in the order that the tests run (which is why the first...
Is this a bug in `spark-perf`? A problem with Spark 1.5.1? I'm trying to figure out if this issue is actionable.