kvrocks
kvrocks copied to clipboard
Enhancement: Too much `Slice::ToString()` when encoding the key
Search before asking
- [X] I had searched in the issues and found no similar issues.
Motivation
When encoding the user's key to kvrocks' internal key, there are too much rocksdb::Slice::ToString called, for example:
void ComposeNamespaceKey(const Slice& ns, const Slice& key, std::string *ns_key, bool slot_id_encoded) {
ns_key->clear();
PutFixed8(ns_key, static_cast<uint8_t>(ns.size()));
ns_key->append(ns.ToString());
if (slot_id_encoded) {
auto slot_id = GetSlotNumFromKey(key.ToString());
PutFixed16(ns_key, slot_id);
}
ns_key->append(key.ToString());
}
If the key is short, it just copy the slice on the stack. If the key is long, a on-heap string is allocated just for appending to a key. It will generate lots of little heap object, which will be a waste of both cpu and heap-memory.
Solution
Just use append(ns.data(), ns.size()) is ok.
Are you willing to submit a PR?
- [x] I'm willing to submit a PR!
Great catch. Feel free to submite patches!
https://github.com/apache/incubator-kvrocks/pull/707 @PragmaTwice
This pr solve part of this problem. There are still some callings in GetSlotNumFromKey. Maybe cast Slice to std::string_view, and using it's find and subslice can solve this. But string_view is introduced in C++17. Other solvings may include some string libraries. So I left them there.
Resolved by #707
@git-hulk IIUC #707 only resolves part of this issue?
@git-hulk IIUC #707 only resolves part of this issue?
I thought @mapleFU won't push forward to solve other parts since it needs to introduce new string library or C++17. Correct me or feel free to reopen this issue if I was missing something.
Which the help of string_view the code can be simplified:
uint16_t GetSlotNumFromKey(const std::string &key) {
auto tag = GetTagFromKey(key);
if (tag.empty()) {
tag = key;
}
auto crc = crc16(tag.data(), static_cast<int>(tag.size()));
return static_cast<int>(crc & HASH_SLOTS_MASK);
}
std::string GetTagFromKey(const std::string &key) {
auto left_pos = key.find("{");
if (left_pos == std::string::npos) return std::string();
auto right_pos = key.find("}", left_pos + 1);
// Note that we hash the whole key if there is nothing between {}.
if (right_pos == std::string::npos || right_pos <= left_pos + 1) {
return std::string();
}
return key.substr(left_pos + 1, right_pos - left_pos - 1);
}
GetTagFromKey will cause the key call a ToString(), then GetSlotNumFromKey will generate a std::string, just to compute the crc of the slice.
Also, I notice there is a Metadata::Decode, which will cast the slice to std::string for argument const std::string& ... In kvrocks, the metadata will be short, usally about 16bytes. Allocating them on heap is lightweight, but the interface is kind of ugly.
It's posible to support interface like:
uint16_t GetSlotNumFromKey(rocksdb::Slice key);
rocksdb::Slice GetTagFromKey(rocksdb::Slice key);
Status Decode(rocksdb::Slice key);
When calling them with std::string, just using Slice(s) is ok. But RocksDB didn't has enough string operations support, like find, which is required in GetTagFromKey. Although it has __cpp_lib_string_view and can cast rocksdb::Slice to std::string_view, which supports find and substr, I think introduce it might be trickey.
If we want to solve this problem, there might be some methods:
- Using
std::string_view. - folly, abseil, boost has library like
string_view, but this project didn't include them. - Using
rocksdb::Sliceorconst char*andsize_t length. Supportfindand other operations in project's utils.
I'm not familiar with handling C++ project, and I have no idea how to handle this problem. Can you give me some advice? @git-hulk @tisonkun @PragmaTwice
In my personal opinion, it's a bit heavy to introduce boost or C++17(require newer version compiler) to improve this situation. We can also reopen this issue if want to discuss further.
I think it is a long term target to update the standard version. GCC 4.8 is just too old and has no more support and backport from the gnu team.
I know that in some old distros (like debian stable or centos which has a default package repo with all antique version software) it is not a super simple job to get more up-to-date compiler, but I think it is not the reason to stuck the standard version since kvrocks is not a library which ABI breaking is insufferable.
For example, clickhouse uses c++20, mongodb uses c++17, and mysql will look for at least GCC 7 from devtoolset in an rhel or suse environment.
But of cause, we can use some workarounds like https://github.com/martinmoene/string-view-lite now, which is a lightweight header-only library.
Closed as inactive. If @mapleFU or @PragmaTwice want to continue the effort, feel free to reopen with an estimate or create a new issue.