rizin icon indicating copy to clipboard operation
rizin copied to clipboard

`HtPx` hash tables cannot be used with arbitrary pointers as keys.

Open Rot127 opened this issue 1 year ago • 10 comments

Work environment

Questions Answers
OS/arch/bits (mandatory) Debian x86_64
File format of the file you reverse (mandatory) -
Architecture/bits of the file (mandatory) -
rizin -v full output, not truncated (mandatory) rizin 0.7.0 @ linux-x86-64
commit: dc753bba5822378bc1b871638ed232cfdb3815

Expected behavior

Every unique pointer is a unique key.

Actual behavior

Pointers never added are found in the hastable.

Steps to reproduce the behavior

See: https://github.com/rizinorg/rizin/pull/4260

Additional Logs, screenshots, source code, configuration dump, ...

This breaks completely SetP. ~~But it isn't used anywhere anyways~~ It is used in agraph.c.

Rot127 avatar Feb 18 '24 10:02 Rot127

As an indicator where problems can occur: The list with usages of zero initialized pointer as keys hash tables:

> grep -rnIE "ht_p[up]_new0\(" librz/ 

librz/util/set.c:9:	return ht_pp_new0();
librz/util/lib.c:33:	lib->opened_dirs = ht_pu_new0();
librz/analysis/analysis.c:95:	analysis->ht_name_fun = ht_pp_new0();
librz/analysis/class.c:1287:	HtPP /*<char *name, RzGraphNode *node>*/ *hashmap = ht_pp_new0();
librz/type/type.c:1130:	HtPP *used_types = ht_pp_new0(); // use a hash table to keep track of unfolded types
librz/type/parser/c_cpp_parser.c:36:		state->types = ht_pp_new0();
librz/type/parser/c_cpp_parser.c:41:		state->callables = ht_pp_new0();
librz/type/parser/c_cpp_parser.c:46:	state->forward = ht_pp_new0();
librz/type/serialize_functions.c:136:	HtPP *type_str_cache = ht_pp_new0(); // cache from a known C type extr to its RzType representation for skipping the parser if possible
librz/core/cmd/cmd_analysis.c:4270:	HtPU *ht = ht_pu_new0();
librz/core/cmd/cmd_analysis.c:4297:	HtPU *keys_set = ht_pu_new0();
librz/core/cmd/cmd_analysis.c:4306:		HtPU *db = ht_pu_new0();
librz/core/cmd/cmd_api.c:228:	cmd->ht_cmds = ht_pp_new0();
librz/core/cmd/cmd_info.c:87:		HtPP *files = ht_pp_new0();
librz/core/cmd/cmd_eval.c:142:	HtPU *themes = ht_pu_new0();
librz/config/config.c:495:	cfg->ht = ht_pp_new0();
librz/config/serialize_config.c:52:		ctx.exclude = ht_pp_new0();
librz/debug/trace.c:21:	t->ht = ht_pp_new0();
librz/debug/trace.c:304:	t->ht = ht_pp_new0();
librz/reg/profile.c:356:		reg->regset[t].ht_regs = ht_pp_new0();
librz/include/rz_util/rz_serialize.h:49:	return ht_pp_new0();
librz/bin/format/mach0/dyldcache.c:567:	HtPU *path_to_idx = ht_pu_new0();
librz/bin/format/mach0/mach0.c:2524:	HtPP *hash = ht_pp_new0();
librz/bin/format/elf/elf_symbols.c:404:	HtPP *name_set = ht_pp_new0();
librz/bin/bobj_process_class.c:88:	o->name_to_class_object = ht_pp_new0();
librz/bin/bobj_process_class.c:89:	o->glue_to_class_method = ht_pp_new0();
librz/bin/bobj_process_class.c:90:	o->glue_to_class_field = ht_pp_new0();
librz/bin/bobj_process_section.c:36:	HtPP *filter_db = bin->filter ? ht_pp_new0() : NULL;
librz/bin/bobj_process_symbol.c:98:	o->import_name_symbols = ht_pp_new0();

Rot127 avatar Feb 19 '24 04:02 Rot127

HtPx is supposed to be used with strings as keys. As @wargio suggested, the proper fix would be in this case to introduce HtSx (S for strings). While HtPx should be a HtUx which simply casts the key pointers to ut64.

Rot127 avatar Feb 19 '24 07:02 Rot127

IIRC I used zeroized options for SetP in agraph.c so pointers are treated as integers Yeah ugly hack, I know.

pelijah avatar Feb 19 '24 07:02 pelijah

Ah nice. That is why it worked :D As long as we don't use SetP until it's refactored, it's fine (with the one exception of yours of course).

Rot127 avatar Feb 19 '24 08:02 Rot127

Wait, there are options to modify how the key should be compared. I think the doc is confusing, but it should be possible.

typedef struct Ht_(options_t) {
	HT_(ListComparator)
	cmp; // Function for comparing values. Returns 0 if eq.

In ht_inc.c:

static inline bool is_kv_equal(HtName_(Ht) * ht, const KEY_TYPE key, const ut32 key_len, const HT_(Kv) * kv) {
	if (key_len != kv->key_len) {
		return false;
	}

	bool res = key == kv->key;
	if (!res && ht->opt.cmp) {
		res = !ht->opt.cmp(key, kv->key);
	}
	return res;
}

ht_pp.c uses:

static HtName_(Ht) * internal_ht_default_new(ut32 size, ut32 prime_idx, HT_(DupValue) valdup, HT_(KvFreeFunc) pair_free, HT_(CalcSizeV) calcsizeV) {
	HT_(Options)
	opt = {
		.cmp = (HT_(ListComparator))strcmp,

so the keys are compared with strcmp, that's probably why you get some weird results. If you want to just compare the raw values (as int), don't use HtP, but use HtUx. Otherwise, create a custom Ht with .cmp set to something that makes sense for you.

ret2libc avatar Feb 19 '24 08:02 ret2libc

yes, but the current implementation is very hard to use. i would make a proper implementation and clear definition of their usages. also HtUx does not implement any hash function therefore the table is handled as an array.

wargio avatar Feb 19 '24 08:02 wargio

@wargio Interestingly not, I had a case when it used the buckets. Push the test for it in a second. Which just strengthens your point to make the functioning of Ht clear to the user.

Rot127 avatar Feb 19 '24 09:02 Rot127

bool test_htuu_uses_buckets(void) {
       HtUU *ht = ht_uu_new0();
       ht_uu_insert(ht, 0, 1);
       ht_uu_insert(ht, 1, 1);
       ht_uu_insert(ht, 2, 1);
       ht_uu_insert(ht, 3, 1);
       printf("\nht->count = %d\n", ht->count);
       ht_uu_insert(ht, 10000000, 1);
       printf("ht->count = %d\n", ht->count);
       for (int i = 0; i < ht->size; ++i) {
               printf("ht->table[%d].count = %d => [ ", i, ht->table[i].count);
               for (ut32 j = 0; j < ht->table[i].count; ++j) {
                       printf("%llu ", ht->table[i].arr[j].key);
               }
               printf("]\n");
       }

       // Output:
       // ht->count = 4
       // ht->count = 5
       // ht->table[0].count = 1 => [ 0 ]
       // ht->table[1].count = 1 => [ 1 ]
       // ht->table[2].count = 1 => [ 2 ]
       // ht->table[3].count = 2 => [ 3 10000000 ]
       // ht->table[4].count = 0 => [ ]
       // ht->table[5].count = 0 => [ ]
       // ht->table[6].count = 0 => [ ]

       mu_end;
}

Rot127 avatar Feb 19 '24 09:02 Rot127

HtUU should definitely use the bucketing.

I can improve the doc for Ht if needed as I wrote most of it. The hash function is just used to improve the way the keys are spread into buckets, so that (optimally) each bucket has mostly the same number of items. If the hash function is not provided we just use the % operator IIRC.

ret2libc avatar Feb 19 '24 09:02 ret2libc

Improved Ht docs would be awesome!

Rot127 avatar Feb 19 '24 10:02 Rot127

The situation is better now, with the introduction of HtSP, HtSS, HtSU, SetS.

XVilka avatar May 26 '24 12:05 XVilka

Yes! Also the _new functions enforce to define the behavior if necessary. So I would consider this as closed.

Rot127 avatar May 26 '24 13:05 Rot127