hive icon indicating copy to clipboard operation
hive copied to clipboard

Benchmark section removed from README

Open themisir opened this issue 3 years ago • 4 comments

themisir avatar Feb 03 '22 20:02 themisir

@themisir Why are you considering removing the benchmark?

PawlikMichal25 avatar Mar 26 '22 20:03 PawlikMichal25

@themisir Why are you considering removing the benchmark?

It does not reflect current state of the ecosystem. The benchmarks were old and a lot changed since then. Also some points might be "unfair" to compare on benchmark basis. So I don't want people to make decision based on misleading data.

Also some benchmark steps were not tested correctly back then or for some other reasons when I do benchmark myself the results are different, for some reason Lazy boxes performed worst than regular boxes.

themisir avatar Mar 27 '22 13:03 themisir

It's a fair point that we should keep the benchmark up to date. I think it's normal for benchmarks that they compare a certain thing though. So it's developer's responsibility to think whether a certain benchmark relates to their use case.

I'm not sure how it's right now, but I actually adjusted and ran the benchmark myself for my article. The results were somewhat different, but the conclusion was the same - Hive is fast.

Perhaps the benchmark should be updated or described differently, but I think it's responsible to include some benchmark, if we're claiming that Hive is "blazing fast" in documentation.

PS: I'm just a random guy from internet, so sorry for "sticking my nose" here, but I always liked Hive and considered speed to be one of it's most important benefits :)

PawlikMichal25 avatar Mar 27 '22 14:03 PawlikMichal25

Yeah I've updated and added new comparison points to the benchmark. You can see final result here. But what's interesting is apperately lazy reads was way slower than what's shown on current README. I think that's because Hive have to read data from file when doing lazy reads, which is the whole point (otherwise hive have to read the whole data into memory, which indeed takes memory, which might be an issue if data source is bigger). But that also means using more conventional solutions like SQLite makes more sense for data heavy workloads instead of using Hive because of efficiency.

PS: I'm just a random guy from internet, so sorry for "sticking my nose" here, but I always liked Hive and considered speed to be one of it's most important benefits :)

Don't worry, I'm actually not sure about removing that too, so that's why didn't wanted to merge this PR yet. I think maybe I should update data instead of removing..

themisir avatar Mar 27 '22 15:03 themisir