smhasher icon indicating copy to clipboard operation
smhasher copied to clipboard

null test

Open wangyi-fudan opened this issue 5 years ago • 5 comments

bool NullTest ( struct HashInfo *info ){
  pfHash hash = info->hash;
  unsigned long long h0,	h1;	
  hash(NULL,	0,	0,	&h0);	
  hash(NULL,	0,	1,	&h1);
  printf("nullkey_seed0\t%llu\nnullkey_seed1\t%llu\n%s\n",	h0,	h1,	
           h0==h1?"FAIL":(h0==0||h1==1?"Imperfect":"PASS"));
  return	h0!=h1;
}  

clhash FAIL

fasthash64 Imperfect

City64 Imperfect

MUM Imperfect

xxh3 Imperfect

t1ha2_atonce Imperfect

wangyi-fudan avatar Mar 25 '19 15:03 wangyi-fudan

Thanks, but I'm not sure if this is a valid assumption. The return value with a NULL key is undefined, it just should not access out of bounds. Several hashes were OOB.

rurban avatar Mar 25 '19 16:03 rurban

It is just an extreme case test to see if these hashes works properly. if they do memory access, they will fail. xxh3 leaks its key if 0 byte string is processed. some hashes returns zero is key is zero at 0 byte, which is not so good.

wangyi-fudan avatar Mar 25 '19 21:03 wangyi-fudan

I think this is valid test, i.e. function should not read data when len = 0. On the other hand "Imperfect" criteria is wrong.

For instance, t1ha2_atonce() pass avalanche test for len=0 (i.e. for seeds only) by https://github.com/demerphq/smhasher. But t1ha2_atonce() intentionally return 0, when both seed and data-length are zero (and I think that is normal behavior).

erthink avatar Apr 01 '19 12:04 erthink

xxh3 makes a distinction between its secret key, and the seed. The secret key is hard to generate, while the seed is just a parameter of the function, which could change on a whim.

xxh3 with a length of 0 and a seed of 0 will return 0, and, like @leo-yuriev, I also believe this is correct behavior. At least, I received requests for such a feature.

When length is 0 but seed is not 0, xxh3 returns the seed (not the secret key !). If this is considered too much of a secret to share, I can change the function so that it returns something less obvious. After all, optimizing the return value for len==0 is not terribly high in priority order. Note that, since it's basically a bijection, whatever its complexity, it will always be possible to brute-force the reverse transformation (given enough resource).

Cyan4973 avatar Apr 23 '19 22:04 Cyan4973

I'll add something like that. It is similar to some BadSeeds tests, where a 0 seed with a NULL key (or len 0) returns mostly 0, which is fair, but not ok for len > 0 and only null bytes. Yann made a good point revealing the secret. I think we should print that case.

This is a cheap test for some edge cases.

rurban avatar Apr 13 '21 06:04 rurban