klib
klib copied to clipboard
Array keys
Klib looks really interesting! I would love to use it in our Snabb Switch project. Snabb Switch is a simple and fast IP-networking toolkit based on LuaJIT.
Questions!
- Is there a LuaJIT binding anywhere for khash and/or ktree?
- Could these data structures support fixed-size array keys (with embedded zeros)?
I expect that we would want to have hundreds of instances of tables/trees, each with up to around a million items, and key types like int64
, char[16]
, char[40]
containing information that we have extracted from network packets (e.g. IPv6 addresses).
I have previously investigated Judy Arrays in 1250 LOC but I did not manage to match that to our needs.
Help to get started would be much appreciated!
+1
I also was thinking about some sort of bindings for khash to LuaJIT, so that single hash table could be used from both C and Lua code.
2015-03-10 13:08 GMT+03:00 Luke Gorrie [email protected]:
Klib looks really interesting! I would love to use it in our Snabb Switch https://github.com/SnabbCo/snabbswitch/ project. Snabb Switch is a simple and fast IP-networking toolkit based on LuaJIT.
Questions!
- Is there a LuaJIT binding anywhere for khash and/or ktree?
- Could these data structures support fixed-size array keys (with embedded zeros)?
I expect that we would want to have hundreds of instances of tables/trees, each with up to around a million items, and key types like int64, char[16], char[40] containing information that we have extracted from network packets (e.g. IPv6 addresses).
I have previously investigated Judy Arrays in 1250 LOC https://code.google.com/p/judyarray/ but I did not manage to match that to our needs.
Help to get started would be much appreciated!
— Reply to this email directly or view it on GitHub https://github.com/attractivechaos/klib/issues/44.
Regards, Konstantin
I have no idea about LuaJIT bindings, but you can use a fixed size array as a key with khash
if you wrap it in a struct.
This example should work:
// create hash map that maps 16 byte key to char*
typedef struct { char d[16]; } MyKey;
static inline int my_key_hash(MyKey k) { return k.d[0]*3+k.d[1]*7; /* some hash function */ }
static inline int my_keys_eq(MyKey k1, MyKey k2) { return memcmp(k1.d, k2.d, sizeof(k1.d)); }
KHASH_INIT(MyHash, MyKey, char*, 1, my_key_hash, my_keys_eq)
Example Usage:
int ret, is_missing;
khiter_t k;
khash_t(MyHash) *h = kh_init(MyHash);
// some values
MyKey key1 = {.d = {0}}; // key of all zeros
char value1[] = "hello";
// Add
k = kh_put(MyHash, h, key1, &ret);
kh_value(h, k) = value1;
// Find
k = kh_get(MyHash, h, key1);
is_missing = (k == kh_end(h));
kh_del(MyHash, h, k);
kh_destroy(MyHash, h);
Thanks for the feedback! This should get me started :-)
@noporpoise This example does not work the way I expect. The code compiles and runs cleanly, but is_missing
is true. It looks like the lookup fails. Is there a bug?
Ah yes, my bad! Sorry. Can you spot the error:
static inline int my_keys_eq(MyKey k1, MyKey k2) { return memcmp(k1.d, k2.d, sizeof(k1.d)); }
should be:
static inline int my_keys_eq(MyKey k1, MyKey k2) { return !memcmp(k1.d, k2.d, sizeof(k1.d)); }
The equals functions should return non-zero if they match, zero otherwise. Mine was doing the opposite. Fixed by adding a !
.
Thanks!
I have also been tweaking around and I have been infected by C macro fever. (I thought I had an immunity but it seems not.)
How about this for generic code to define a bunch of kinds of hashtables, each with a constant-size array key?
static inline array_hash(const char *a, int n) {
khint_t h = 0;
while (n--) { h = (h << 5) - h + (khint_t)*a++; }
return h;
}
#define ARRAY_TABLE_INIT(name, size, khval_t) \
static inline hash_##name##_at(const char *a) { array_hash(a, size); } \
static inline equal_##name##_at(const char *a, const char *b) { return memcmp(a, b, 16) == 0; } \
KHASH_INIT(name, const char *, khval_t, 1, hash_##name##_at, equal_##name##_at)
ARRAY_TABLE_INIT(array16, 16, char*)
ARRAY_TABLE_INIT(array32, 32, char*)
ARRAY_TABLE_INIT(array40, 40, char*)
I am such a noob :-). I see that I am mixing up pointers and values here. Thanks again for the getting started tips. Looks like I should read the khash
code properly.
Round 3!
I have made a first pass through reading khash. I am impressed! I have made a new attempt at making a library front-end that will be easy to access from LuaJIT. (The actual LuaJIT binding is still to be done.) I would be glad to have some review if anybody is inspired to do so :-) I may well still fundamentally misunderstand khash.
// hashtable.c -- Hashtable based on khash
//
// khash is a small, simple, fast hashtable library for C.
// This is a library to make khash useful via the LuaJIT FFI.
//
// khash makes extensive use of the C preprocessor. The macro
// KHASH_INIT expands into code for a unique hashtable with its own
// type of key, type of value (or none), hash function, and equality
// predicate.
//
// Notable aspects of this design:
//
// 1. Hashtable memory layout is specialized for the type. If you ask
// for 6-byte keys or 1-byte values then this will translate into
// an efficiently packed representation in memory.
//
// 2. Hashtable lookups expand into code specialized for the key
// type. GCC has the opportunity to make optimizations like
// constant-size loop unrolling when hashing and comparing.
//
// 3. Lookups and insertions expand into code with the appropriate C
// type declarations for keys and values. You don't need to cast
// back and forth between the real types and an opaque type like
// void*.
//
// And here is what that means when we use this from LuaJIT:
//
// 1. Hashtable memory layout being memory-efficient is good.
//
// 2. Lookups being aggressively optimized by GCC is good too. (We are
// calling the C lookup functions via FFI.)
//
// 3. Specialized C type declarations on lookup/insert is a
// nuisance. On the Lua level we care about the exact types of keys
// and values but on the C level we only care about their sizes. So
// in our C interface we only talk about the size of the keys and
// values, and we translate this into opaque types for khash.
//
// khash: http://attractivechaos.github.io/klib/
#include <assert.h>
#include <stdio.h>
#include "khash.h"
// Define a type for a fixed-size key.
#define ARRAY_KEY_TYPE(name, size) \
typedef struct { char k[size]; }__attribute__((packed)) name##_key_t;
// Define a type for a fixed-size value.
#define ARRAY_VALUE_TYPE(name, size) \
typedef struct { char v[size]; }__attribute__((packed)) name##_val_t;
// Define a hash function for a fixed-size key.
#define ARRAY_HASH_FUNC(name, size) \
static inline int hash_##name(name##_key_t a) { \
khint_t h = 0; \
int i; \
for (i = 0; i < size; i++) { \
h = (h << 5) - h + (khint_t)a.k[i]; \
} \
return h; \
}
// Define an equality test function for a fixed-size key.
#define ARRAY_EQUAL_FUNC(name, size) \
static inline int equal_##name(name##_key_t a, name##_key_t b) { \
return memcmp(a.k, b.k, size) == 0; \
}
#define ARRAY_TABLE_INIT(name, keysize, valuesize) \
ARRAY_KEY_TYPE(name, keysize) \
ARRAY_VALUE_TYPE(name, valuesize) \
ARRAY_HASH_FUNC(name, keysize) \
ARRAY_EQUAL_FUNC(name, keysize) \
KHASH_INIT(name, name##_key_t, name##_val_t, valuesize>0, hash_##name, equal_##name)
// Define a custom hashtable:
// named char16
// 16-byte keys
// pointer-size values
ARRAY_TABLE_INIT(char16, 16, sizeof(char*))
int main() {
int ret, is_missing;
khiter_t k;
khash_t(char16) *h = kh_init(char16);
// some values
char16_key_t key1 = {.k={0}}; // key of all zeros
char value1[] = "hello";
char16_key_t key2 = {.k={1}}; // key of all ones
char value2[] = "world";
// Add
k = kh_put(char16, h, key1, &ret);
memcpy(&kh_value(h, k), value1, sizeof(value1));
k = kh_put(char16, h, key2, &ret);
memcpy(&kh_value(h, k), value2, sizeof(value2));
// Find
k = kh_get(char16, h, key1);
printf("%s", (char*)&kh_value(h, k));
is_missing = (k == kh_end(h));
assert(!is_missing);
k = kh_get(char16, h, key2);
printf(" %s\n", (char*)&kh_value(h, k));
is_missing = (k == kh_end(h));
assert(!is_missing);
kh_del(char16, h, k);
kh_destroy(char16, h);
return 0;
}
I added lookup and insert operations as plain functions that should be callable from LuaJIT:
// Define a lookup function for LuaJIT to call.
// The lookup function returns null or a pointer to the value.
#define ARRAY_LOOKUP_FUNC(name, size) \
void *lookup_##name(khash_t(name) *h, name##_key_t key) { \
khiter_t v = kh_get(name, h, key); \
return v == kh_end(h) ? NULL : &kh_value(h, v); \
}
// Define an insert function for LuaJIT to call.
// The insert function returns a pointer to the value for initialization.
#define ARRAY_INSERT_FUNC(name, size) \
void *insert_##name(khash_t(name) *h, name##_key_t key) { \
int ret; \
return &kh_value(h, kh_put(name, h, key, &ret)); \
}
The usage is like this:
// insert values
memcpy(insert_char16(h, key1), value1, sizeof(value1));
memcpy(insert_char16(h, key2), value2, sizeof(value1));
// lookup values
printf("%s", (char*)lookup_char16(h, key1));
printf("%s", (char*)lookup_char16(h, key2));
Couple of reflections about using non-tiny keys (e.g. 40-byte structs):
- The lookup and insert operations are passing keys by value. This seems like gratuitous extra work compared with passing pointers. However, this is the interface of
khash_get()
. - Storing the keys inline in the keys structure may make resizing the table more expensive (?).
However, it is not clear that either issue represents a practical problem and so I choose not to worry about them for the moment.
Guys. This is great. If you get the luajit FFI bindings. I stumbled across it by accident must have missed it on the luajit distribution list. I think it merits a mention on the FFI page for the luajit. I know my problems with luajit normally revolve around the memory limitations so the more pre-built easy ways there are to bypass the problem the easier my life becomes..