write-a-hash-table
write-a-hash-table copied to clipboard
`ht_get_hash` not working correctly
ht_get_hash returns the same slot subsequently up to very high attempts in some cases. Example:
(gdb) p ht_get_hash("mybrujmaonutk", 211, 0)
$32 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 1)
$33 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 2)
$34 = 38
// and so on up to ...
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177647)
$35 = 38
(gdb) p ht_get_hash("mybrujmaonutk", 211, 10177648)
$36 = -13
My definitions:
static int ht_hash(const char* s, const int a, const int m) {
long hash = 0;
const int len_s = strlen(s);
for (int i = 0; i < len_s; i++) {
hash += (long)pow(a, len_s - (i+1)) * s[i];
hash = hash % m;
}
return (int)hash;
}
static int ht_get_hash(
const char* s, const int num_buckets, const int attempt
) {
const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets);
const int hash_b = ht_hash(s, HT_PRIME_2, num_buckets);
return (hash_a + (attempt * (hash_b + 1))) % num_buckets;
}
What have you set HT_PRIME_1 and HT_PRIME_2 to? I am facing the same issue
I have found the solution to this issue. Make sure the HT_PRIME values are as follows ->
#define HT_PRIME_1 151;
#define HT_PRIME_2 163;
The ht_get_hash function no longer returns negative values and also works in far fewer attempts.
@nouritsu my HT_PRIME_1 and HT_PRIME_2 were set to 129 and 131. After changing them to 151 and 163, the issue still persists. See my driver code below for reproducing:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include "hash_table.h"
char* rand_word(int min_len, int max_len) {
int i, length;
char c;
// seed random
struct timespec ts;
clock_gettime(CLOCK_MONOTONIC, &ts);
/* using nano-seconds instead of seconds */
srand((time_t)ts.tv_nsec);
length = rand() % (max_len - min_len + 1) + min_len;
char* word = calloc((size_t)(length+1), sizeof(char));
for (i = 0; i < length; i++) {
c = 'a' + rand() % 26;
word[i] = c;
}
word[i] = 0;
return word;
}
int main() {
ht_hash_table* ht = ht_new();
for (int i = 0; i < 5000; i++) {
char *k = rand_word(2, 30);
char *v = rand_word(10, 100);
ht_insert(ht, k, v);
}
ht_del_hash_table(ht);
return 0;
}
Do you ever get out-of-bounds writes?
I see you get negative indices🤔
@winterrdog yes, program ends with "Segmentation fault" in some cases (not always though). It means that negative index works in some cases (or at least it doesn't cause segmentation fault) while not in others. I have no idea how this happens.
ht_get_hashreturns the same slot subsequently up to very high attempts in some cases. Example:(gdb) p ht_get_hash("mybrujmaonutk", 211, 0) $32 = 38 (gdb) p ht_get_hash("mybrujmaonutk", 211, 1) $33 = 38 (gdb) p ht_get_hash("mybrujmaonutk", 211, 2) $34 = 38 // and so on up to ... (gdb) p ht_get_hash("mybrujmaonutk", 211, 10177647) $35 = 38 (gdb) p ht_get_hash("mybrujmaonutk", 211, 10177648) $36 = -13My definitions:
static int ht_hash(const char* s, const int a, const int m) { long hash = 0; const int len_s = strlen(s); for (int i = 0; i < len_s; i++) { hash += (long)pow(a, len_s - (i+1)) * s[i]; hash = hash % m; } return (int)hash; } static int ht_get_hash( const char* s, const int num_buckets, const int attempt ) { const int hash_a = ht_hash(s, HT_PRIME_1, num_buckets); const int hash_b = ht_hash(s, HT_PRIME_2, num_buckets); return (hash_a + (attempt * (hash_b + 1))) % num_buckets; }
the reason is because of an edge case that was not handled with hash_b i.e. when hash_b is equivalent to number of buckets. so i dealt with it in my code like so. When it goes unhandled it can lead very weird hashes that lead to negative indices which should never happen at all.
You could deal with it in two ways e.g. either by always checking if the calculated index is within bounds of the table's items member before using the index to access any element in items member. sth like this:
static int is_valid_index(ht_hash_table* table, int index)
{
return index >= 0 && index < table->size;
}
or
do a single check like the one i did in the ht_get_hash function in my code repo