parquet-java icon indicating copy to clipboard operation
parquet-java copied to clipboard

PARQUET-315: Add PARQUET_1_0 and non-repeated data performance tests …

Open spena opened this issue 9 years ago • 10 comments

Here's a list of changes I did on the parquet-benchmarks module:

  • Add variable to test PARQUET_1_0 format
  • Add variable to generate random data for the tests
  • Move part of the DataGenerator.generate() code to constructors so that we measure time when when writing data only.
  • Add annotation to display AverageTime. The Throughput mode was always displaying less than 1 op/s. So I thought time was a better mode here.

@nezihyigitbasi @danielcweeks You did a good job starting the parquet-benchmarks module. Could you please review these changes?

spena avatar Jun 22 '15 18:06 spena

@spena thanks for taking the time to create this PR. Can you also update the README of the benchmark module?

nezihyigitbasi avatar Jun 22 '15 18:06 nezihyigitbasi

@spena Added a few comments. Once you update the README and address the comments, I will be happy to give it a try. Thanks again!

nezihyigitbasi avatar Jun 22 '15 19:06 nezihyigitbasi

Thanks @nezihyigitbasi for the feedback. I fixed the issues you mentioned. You will be able to generate data this time.

spena avatar Jun 22 '15 21:06 spena

Thanks @nezihyigitbasi I added the changes based on your feedback. I liked your explanation about random data, so I replaced the old paragraph with yours.

I tried to figured out why the benchmark cannot be executed when compiling with hadoop-2, but I did not know why. So, I just replace the README to use the command with out the profile.

One thing. The read-benchmark.sh tests will fail when using parquetVersion=v2 and randomData=true due to the issue in PARQUET-152. If the patch is applied, then all tests run fine.

spena avatar Jun 23 '15 18:06 spena

@spena To keep the fixes to the existing benchmark module separate I created a new PR: https://github.com/apache/parquet-mr/pull/226

Yep I verified that your change makes the read-benchmark go through.

nezihyigitbasi avatar Jun 23 '15 18:06 nezihyigitbasi

@spena I only added a few minor comments to the README. Other than that LGTM. @rdblue I have provided feedback to @spena and verified his changes. When he addresses my final comments this patch LGTM. It's all yours now.

nezihyigitbasi avatar Jun 23 '15 18:06 nezihyigitbasi

Thanks @nezihyigitbasi. I added the minor changes you mentioned.

Once #226 is committed, I will merge the code to my branch, and check it is applied correctly.

spena avatar Jun 23 '15 19:06 spena

@nezihyigitbasi @rdblue I updated the branch with the latest changes of master (includes #226).

No more work to do here.

spena avatar Jul 09 '15 18:07 spena

Thanks @spena, I'll take a look at this today.

rdblue avatar Jul 09 '15 19:07 rdblue

@rdblue did you want to merge this? @spena do you need to merge master

julienledem avatar Dec 12 '15 23:12 julienledem