incubator-pegasus
incubator-pegasus copied to clipboard
feat: improve performance of count_data
What problem does this PR solve?
issue: https://github.com/apache/incubator-pegasus/issues/1090 same job as before: https://github.com/apache/incubator-pegasus/pull/728
When we precisely count data for a large table, it will cost minutes or hours. However, it's unnecessarily return key-values from server to client.
What is changed and how does it work?
Actually, we just need the count of data. So we just need transfer the count of data from server to client, but not the detailed data. When we need it, we can input "count_data -c -o" on pegasus_shell. In my test, it will 2x on onebox faster than before.
Tests
- Unit test
- Manual test (add detailed scripts or steps below)
Related changes
- Need to update the documentation
- Need to be included in the release note
Could you give some performance comparation with the old version ?