kvrocks icon indicating copy to clipboard operation
kvrocks copied to clipboard

Enhancement: Too much `Slice::ToString()` when encoding the key

Open mapleFU opened this issue 3 years ago • 8 comments
trafficstars

Search before asking

  • [X] I had searched in the issues and found no similar issues.

Motivation

When encoding the user's key to kvrocks' internal key, there are too much rocksdb::Slice::ToString called, for example:

void ComposeNamespaceKey(const Slice& ns, const Slice& key, std::string *ns_key, bool slot_id_encoded) {
  ns_key->clear();

  PutFixed8(ns_key, static_cast<uint8_t>(ns.size()));
  ns_key->append(ns.ToString());

  if (slot_id_encoded) {
    auto slot_id = GetSlotNumFromKey(key.ToString());
    PutFixed16(ns_key, slot_id);
  }

  ns_key->append(key.ToString());
}

If the key is short, it just copy the slice on the stack. If the key is long, a on-heap string is allocated just for appending to a key. It will generate lots of little heap object, which will be a waste of both cpu and heap-memory.

Solution

Just use append(ns.data(), ns.size()) is ok.

Are you willing to submit a PR?

  • [x] I'm willing to submit a PR!

mapleFU avatar Jul 03 '22 07:07 mapleFU

Great catch. Feel free to submite patches!

PragmaTwice avatar Jul 03 '22 08:07 PragmaTwice

https://github.com/apache/incubator-kvrocks/pull/707 @PragmaTwice This pr solve part of this problem. There are still some callings in GetSlotNumFromKey. Maybe cast Slice to std::string_view, and using it's find and subslice can solve this. But string_view is introduced in C++17. Other solvings may include some string libraries. So I left them there.

mapleFU avatar Jul 03 '22 09:07 mapleFU

Resolved by #707

git-hulk avatar Jul 03 '22 13:07 git-hulk

@git-hulk IIUC #707 only resolves part of this issue?

tisonkun avatar Jul 03 '22 14:07 tisonkun

@git-hulk IIUC #707 only resolves part of this issue?

I thought @mapleFU won't push forward to solve other parts since it needs to introduce new string library or C++17. Correct me or feel free to reopen this issue if I was missing something.

git-hulk avatar Jul 03 '22 14:07 git-hulk

Which the help of string_view the code can be simplified:

uint16_t GetSlotNumFromKey(const std::string &key) {
  auto tag = GetTagFromKey(key);
  if (tag.empty()) {
    tag = key;
  }

  auto crc = crc16(tag.data(), static_cast<int>(tag.size()));
  return static_cast<int>(crc & HASH_SLOTS_MASK);
}

std::string GetTagFromKey(const std::string &key) {
  auto left_pos = key.find("{");
  if (left_pos == std::string::npos) return std::string();
  auto right_pos = key.find("}", left_pos + 1);
  // Note that we hash the whole key if there is nothing between {}.
  if (right_pos == std::string::npos || right_pos <= left_pos + 1) {
    return std::string();
  }

  return key.substr(left_pos + 1, right_pos - left_pos - 1);
}

GetTagFromKey will cause the key call a ToString(), then GetSlotNumFromKey will generate a std::string, just to compute the crc of the slice.

Also, I notice there is a Metadata::Decode, which will cast the slice to std::string for argument const std::string& ... In kvrocks, the metadata will be short, usally about 16bytes. Allocating them on heap is lightweight, but the interface is kind of ugly.

It's posible to support interface like:

uint16_t GetSlotNumFromKey(rocksdb::Slice key);
rocksdb::Slice GetTagFromKey(rocksdb::Slice key);
Status Decode(rocksdb::Slice key);

When calling them with std::string, just using Slice(s) is ok. But RocksDB didn't has enough string operations support, like find, which is required in GetTagFromKey. Although it has __cpp_lib_string_view and can cast rocksdb::Slice to std::string_view, which supports find and substr, I think introduce it might be trickey.

If we want to solve this problem, there might be some methods:

  1. Using std::string_view.
  2. folly, abseil, boost has library like string_view, but this project didn't include them.
  3. Using rocksdb::Slice or const char* and size_t length. Support find and other operations in project's utils.

I'm not familiar with handling C++ project, and I have no idea how to handle this problem. Can you give me some advice? @git-hulk @tisonkun @PragmaTwice

mapleFU avatar Jul 03 '22 16:07 mapleFU

In my personal opinion, it's a bit heavy to introduce boost or C++17(require newer version compiler) to improve this situation. We can also reopen this issue if want to discuss further.

git-hulk avatar Jul 04 '22 01:07 git-hulk

I think it is a long term target to update the standard version. GCC 4.8 is just too old and has no more support and backport from the gnu team.

I know that in some old distros (like debian stable or centos which has a default package repo with all antique version software) it is not a super simple job to get more up-to-date compiler, but I think it is not the reason to stuck the standard version since kvrocks is not a library which ABI breaking is insufferable.

For example, clickhouse uses c++20, mongodb uses c++17, and mysql will look for at least GCC 7 from devtoolset in an rhel or suse environment.

But of cause, we can use some workarounds like https://github.com/martinmoene/string-view-lite now, which is a lightweight header-only library.

PragmaTwice avatar Jul 05 '22 05:07 PragmaTwice

Closed as inactive. If @mapleFU or @PragmaTwice want to continue the effort, feel free to reopen with an estimate or create a new issue.

tisonkun avatar Sep 11 '22 03:09 tisonkun