rizin
rizin copied to clipboard
`HtPx` hash tables cannot be used with arbitrary pointers as keys.
Work environment
Questions | Answers |
---|---|
OS/arch/bits (mandatory) | Debian x86_64 |
File format of the file you reverse (mandatory) | - |
Architecture/bits of the file (mandatory) | - |
rizin -v full output, not truncated (mandatory) |
rizin 0.7.0 @ linux-x86-64 |
commit: dc753bba5822378bc1b871638ed232cfdb3815 |
Expected behavior
Every unique pointer is a unique key.
Actual behavior
Pointers never added are found in the hastable.
Steps to reproduce the behavior
See: https://github.com/rizinorg/rizin/pull/4260
Additional Logs, screenshots, source code, configuration dump, ...
This breaks completely SetP
. ~~But it isn't used anywhere anyways~~
It is used in agraph.c
.
As an indicator where problems can occur: The list with usages of zero initialized pointer as keys hash tables:
> grep -rnIE "ht_p[up]_new0\(" librz/
librz/util/set.c:9: return ht_pp_new0();
librz/util/lib.c:33: lib->opened_dirs = ht_pu_new0();
librz/analysis/analysis.c:95: analysis->ht_name_fun = ht_pp_new0();
librz/analysis/class.c:1287: HtPP /*<char *name, RzGraphNode *node>*/ *hashmap = ht_pp_new0();
librz/type/type.c:1130: HtPP *used_types = ht_pp_new0(); // use a hash table to keep track of unfolded types
librz/type/parser/c_cpp_parser.c:36: state->types = ht_pp_new0();
librz/type/parser/c_cpp_parser.c:41: state->callables = ht_pp_new0();
librz/type/parser/c_cpp_parser.c:46: state->forward = ht_pp_new0();
librz/type/serialize_functions.c:136: HtPP *type_str_cache = ht_pp_new0(); // cache from a known C type extr to its RzType representation for skipping the parser if possible
librz/core/cmd/cmd_analysis.c:4270: HtPU *ht = ht_pu_new0();
librz/core/cmd/cmd_analysis.c:4297: HtPU *keys_set = ht_pu_new0();
librz/core/cmd/cmd_analysis.c:4306: HtPU *db = ht_pu_new0();
librz/core/cmd/cmd_api.c:228: cmd->ht_cmds = ht_pp_new0();
librz/core/cmd/cmd_info.c:87: HtPP *files = ht_pp_new0();
librz/core/cmd/cmd_eval.c:142: HtPU *themes = ht_pu_new0();
librz/config/config.c:495: cfg->ht = ht_pp_new0();
librz/config/serialize_config.c:52: ctx.exclude = ht_pp_new0();
librz/debug/trace.c:21: t->ht = ht_pp_new0();
librz/debug/trace.c:304: t->ht = ht_pp_new0();
librz/reg/profile.c:356: reg->regset[t].ht_regs = ht_pp_new0();
librz/include/rz_util/rz_serialize.h:49: return ht_pp_new0();
librz/bin/format/mach0/dyldcache.c:567: HtPU *path_to_idx = ht_pu_new0();
librz/bin/format/mach0/mach0.c:2524: HtPP *hash = ht_pp_new0();
librz/bin/format/elf/elf_symbols.c:404: HtPP *name_set = ht_pp_new0();
librz/bin/bobj_process_class.c:88: o->name_to_class_object = ht_pp_new0();
librz/bin/bobj_process_class.c:89: o->glue_to_class_method = ht_pp_new0();
librz/bin/bobj_process_class.c:90: o->glue_to_class_field = ht_pp_new0();
librz/bin/bobj_process_section.c:36: HtPP *filter_db = bin->filter ? ht_pp_new0() : NULL;
librz/bin/bobj_process_symbol.c:98: o->import_name_symbols = ht_pp_new0();
HtPx
is supposed to be used with strings as keys. As @wargio suggested, the proper fix would be in this case to introduce HtSx
(S
for strings).
While HtPx
should be a HtUx
which simply casts the key pointers to ut64
.
IIRC I used zeroized options for SetP in agraph.c so pointers are treated as integers Yeah ugly hack, I know.
Ah nice. That is why it worked :D
As long as we don't use SetP
until it's refactored, it's fine (with the one exception of yours of course).
Wait, there are options to modify how the key should be compared. I think the doc is confusing, but it should be possible.
typedef struct Ht_(options_t) {
HT_(ListComparator)
cmp; // Function for comparing values. Returns 0 if eq.
In ht_inc.c:
static inline bool is_kv_equal(HtName_(Ht) * ht, const KEY_TYPE key, const ut32 key_len, const HT_(Kv) * kv) {
if (key_len != kv->key_len) {
return false;
}
bool res = key == kv->key;
if (!res && ht->opt.cmp) {
res = !ht->opt.cmp(key, kv->key);
}
return res;
}
ht_pp.c uses:
static HtName_(Ht) * internal_ht_default_new(ut32 size, ut32 prime_idx, HT_(DupValue) valdup, HT_(KvFreeFunc) pair_free, HT_(CalcSizeV) calcsizeV) {
HT_(Options)
opt = {
.cmp = (HT_(ListComparator))strcmp,
so the keys are compared with strcmp
, that's probably why you get some weird results. If you want to just compare the raw values (as int), don't use HtP
, but use HtUx
. Otherwise, create a custom Ht
with .cmp
set to something that makes sense for you.
yes, but the current implementation is very hard to use. i would make a proper implementation and clear definition of their usages. also HtUx does not implement any hash function therefore the table is handled as an array.
@wargio Interestingly not, I had a case when it used the buckets. Push the test for it in a second. Which just strengthens your point to make the functioning of Ht clear to the user.
bool test_htuu_uses_buckets(void) {
HtUU *ht = ht_uu_new0();
ht_uu_insert(ht, 0, 1);
ht_uu_insert(ht, 1, 1);
ht_uu_insert(ht, 2, 1);
ht_uu_insert(ht, 3, 1);
printf("\nht->count = %d\n", ht->count);
ht_uu_insert(ht, 10000000, 1);
printf("ht->count = %d\n", ht->count);
for (int i = 0; i < ht->size; ++i) {
printf("ht->table[%d].count = %d => [ ", i, ht->table[i].count);
for (ut32 j = 0; j < ht->table[i].count; ++j) {
printf("%llu ", ht->table[i].arr[j].key);
}
printf("]\n");
}
// Output:
// ht->count = 4
// ht->count = 5
// ht->table[0].count = 1 => [ 0 ]
// ht->table[1].count = 1 => [ 1 ]
// ht->table[2].count = 1 => [ 2 ]
// ht->table[3].count = 2 => [ 3 10000000 ]
// ht->table[4].count = 0 => [ ]
// ht->table[5].count = 0 => [ ]
// ht->table[6].count = 0 => [ ]
mu_end;
}
HtUU should definitely use the bucketing.
I can improve the doc for Ht if needed as I wrote most of it.
The hash function is just used to improve the way the keys are spread into buckets, so that (optimally) each bucket has mostly the same number of items. If the hash function is not provided we just use the %
operator IIRC.
Improved Ht
docs would be awesome!
The situation is better now, with the introduction of HtSP
, HtSS
, HtSU
, SetS
.
Yes! Also the _new
functions enforce to define the behavior if necessary. So I would consider this as closed.