ARTSynchronized icon indicating copy to clipboard operation
ARTSynchronized copied to clipboard

[Question] How to use string keys?

Open ldionne opened this issue 8 years ago • 6 comments

The provided example uses keys that are integers. The associated loadKey function is:

void loadKey(TID tid, Key &key) {
    // Store the key of the tuple into the key vector
    // Implementation is database specific
    key.setKeyLen(sizeof(tid));
    reinterpret_cast<uint64_t *>(&key[0])[0] = __builtin_bswap64(tid);
}

And insertion goes like:

Key key;
loadKey(tid, key);
tree.insert(key, tid, t);

However, it's unclear to me how I can use strings as the keys in the tree, because I need to load the key from a TID, which is just an integer. I've read the companion paper and I think I understand how TID can be used as a tagged pointer, but I don't see how that helps me much. Any guidance would be highly appreciated.

ldionne avatar May 12 '17 17:05 ldionne

The Key class supports variable length keys. You can use this function to store a string: inline void Key::set(const char bytes[], const std::size_t length); https://github.com/flode/ARTSynchronized/blob/master/Key.h#L80

For example:

const char* keyString="some variable length key": key.set(keyString,sizeof(keyString));

flode avatar May 12 '17 19:05 flode

That's OK, but I still need to provide the loadKey function to the tree, and it's not clear to me how I can implement that function at all.

ldionne avatar May 12 '17 20:05 ldionne

Here's a minimal working reproduction:

#include <ROWEX/Tree.h>

#include <cstdint>
#include <string>
#include <vector>

void loadKey(TID tid, Key& key) {
  // What should I do here?
}

int main() {
  std::vector<std::pair<std::string, std::uint64_t>> dictionary;
  for (int i = 0; i != 100000; ++i) {
    dictionary.push_back({"hello" + std::to_string(i), i});
  }

  ART_ROWEX::Tree tree(loadKey);
  auto thread = tree.getThreadInfo();
  for (auto&& entry : dictionary) {
    std::string const& string = entry.first;
    std::uint64_t const value = entry.second;
    Key key;
    key.set(string.c_str(), string.size());
    tree.insert(key, value, thread);
  }
}

When I run this, I get:

Assertion failed: (level < key.getKeyLen()), function insert, file ../ROWEX/Tree.cpp, line 323.

ldionne avatar May 12 '17 20:05 ldionne

As this ART implementation doesn't store the full keys in the data structure, the loadkeys function needs to be implemented by you to provide the key given the tid.

In your example that could look like this:

#include <ROWEX/Tree.h>

#include <cstdint>
#include <string>
#include <vector>

std::vector<std::pair<std::string, std::uint64_t>> dictionary;
void loadKey(TID tid, Key& key) {
  // Assuming that tid is the index to the dictionary
  std::string const& string = dictionary[tid].first;
  key.set(string.c_str(), string.size());
}

int main() {
  for (int i = 0; i != 100000; ++i) {
    dictionary.push_back({"hello" + std::to_string(i), i});
  }

  ART_ROWEX::Tree tree(loadKey);
  auto thread = tree.getThreadInfo();
  for (auto&& entry : dictionary) {
    std::string const& string = entry.first;
    std::uint64_t const value = entry.second;
    Key key;
    key.set(string.c_str(), string.size());
    tree.insert(key, value, thread);
  }
}

flode avatar May 21 '17 20:05 flode

This still does not work:

./art_benchmark Assertion failed: (level < key.getKeyLen()), function insert, file ../ROWEX/Tree.cpp, line 323.

Do you have any test that uses strings instead of integers that I could take inspiration of? Was this ever tested beyond the (arguably trivial) example.cpp?

ldionne avatar May 23 '17 16:05 ldionne

Just found this again. I had another look at your example The assert triggered is this one: prevent inserting when prefix of key exists already Keys may not be prefixes of other keys in this data structure. One easy way to achieve this in your example is to make the null terminator part of the key: key.set(string.c_str(), string.size() + 1); Note the +1 at the end

flode avatar Dec 22 '20 15:12 flode