feat: improve performance of count_data

Open GehaFearless opened this issue 3 years ago • 1 comments

What problem does this PR solve?

issue: https://github.com/apache/incubator-pegasus/issues/1090 same job as before: https://github.com/apache/incubator-pegasus/pull/728

When we precisely count data for a large table, it will cost minutes or hours. However, it's unnecessarily return key-values from server to client.

What is changed and how does it work?

Actually, we just need the count of data. So we just need transfer the count of data from server to client, but not the detailed data. When we need it, we can input "count_data -c -o" on pegasus_shell. In my test, it will 2x on onebox faster than before.

Tests

Unit test
Manual test (add detailed scripts or steps below)

Related changes

Need to update the documentation
Need to be included in the release note

Aug 01 '22 08:08 GehaFearless

Could you give some performance comparation with the old version ?

Aug 12 '22 07:08 acelyc111