ARTSynchronized
ARTSynchronized copied to clipboard
[Question] How to use string keys?
The provided example uses keys that are integers. The associated loadKey function is:
void loadKey(TID tid, Key &key) {
// Store the key of the tuple into the key vector
// Implementation is database specific
key.setKeyLen(sizeof(tid));
reinterpret_cast<uint64_t *>(&key[0])[0] = __builtin_bswap64(tid);
}
And insertion goes like:
Key key;
loadKey(tid, key);
tree.insert(key, tid, t);
However, it's unclear to me how I can use strings as the keys in the tree, because I need to load the key from a TID, which is just an integer. I've read the companion paper and I think I understand how TID can be used as a tagged pointer, but I don't see how that helps me much. Any guidance would be highly appreciated.
The Key class supports variable length keys. You can use this function to store a string: inline void Key::set(const char bytes[], const std::size_t length); https://github.com/flode/ARTSynchronized/blob/master/Key.h#L80
For example:
const char* keyString="some variable length key": key.set(keyString,sizeof(keyString));
That's OK, but I still need to provide the loadKey function to the tree, and it's not clear to me how I can implement that function at all.
Here's a minimal working reproduction:
#include <ROWEX/Tree.h>
#include <cstdint>
#include <string>
#include <vector>
void loadKey(TID tid, Key& key) {
// What should I do here?
}
int main() {
std::vector<std::pair<std::string, std::uint64_t>> dictionary;
for (int i = 0; i != 100000; ++i) {
dictionary.push_back({"hello" + std::to_string(i), i});
}
ART_ROWEX::Tree tree(loadKey);
auto thread = tree.getThreadInfo();
for (auto&& entry : dictionary) {
std::string const& string = entry.first;
std::uint64_t const value = entry.second;
Key key;
key.set(string.c_str(), string.size());
tree.insert(key, value, thread);
}
}
When I run this, I get:
Assertion failed: (level < key.getKeyLen()), function insert, file ../ROWEX/Tree.cpp, line 323.
As this ART implementation doesn't store the full keys in the data structure, the loadkeys function needs to be implemented by you to provide the key given the tid.
In your example that could look like this:
#include <ROWEX/Tree.h>
#include <cstdint>
#include <string>
#include <vector>
std::vector<std::pair<std::string, std::uint64_t>> dictionary;
void loadKey(TID tid, Key& key) {
// Assuming that tid is the index to the dictionary
std::string const& string = dictionary[tid].first;
key.set(string.c_str(), string.size());
}
int main() {
for (int i = 0; i != 100000; ++i) {
dictionary.push_back({"hello" + std::to_string(i), i});
}
ART_ROWEX::Tree tree(loadKey);
auto thread = tree.getThreadInfo();
for (auto&& entry : dictionary) {
std::string const& string = entry.first;
std::uint64_t const value = entry.second;
Key key;
key.set(string.c_str(), string.size());
tree.insert(key, value, thread);
}
}
This still does not work:
./art_benchmark Assertion failed: (level < key.getKeyLen()), function insert, file ../ROWEX/Tree.cpp, line 323.
Do you have any test that uses strings instead of integers that I could take inspiration of? Was this ever tested beyond the (arguably trivial) example.cpp?
Just found this again. I had another look at your example
The assert triggered is this one: prevent inserting when prefix of key exists already
Keys may not be prefixes of other keys in this data structure. One easy way to achieve this in your example is to make the null terminator part of the key:
key.set(string.c_str(), string.size() + 1); Note the +1 at the end