claripy icon indicating copy to clipboard operation
claripy copied to clipboard

Fix ast/base.py hash issues

Open zwimer opened this issue 3 years ago • 10 comments

This fixes multiple issues:

  1. Non-integer (and non-str) ._hash values for ASTs. Notice how https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L226 can propagate to https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L239 Fundamentally, if ._hash is never used as a hash of the AST, this isn't an issue; i.e. if only hash(my_ast) is used it is ok because we do convert in that function on demand. The problem is, that ._hash is used as the hash.
  2. Differing behavior if annotations exist / do not. At this if statement https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L218 notice we have special cases for floats like fixing the problem that hash(0.0) == hash(-0.0) here: https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L221 But this only happens if not annotations.
  3. Fix anything that uses _calc_hash to determine what key to enter into a hash cache for looking up existing ASTs; since for some cases our cache uses the tuples made via our fast-path here: https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L218 Ex: https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L253
  4. Reduce code duplication between Base.__new__ and Base.__init_with_annotations__.
  5. Fix a bug in Base.__init_with_annotations__ where we didn't use the leaf cache at all and instead were looking for leaves in the non-leaf cache: https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L251
  6. Non-reproducible hashes; since self._hash was not always calculated via _calc_hash but sometimes as a tuple (see point 1).
  7. When deserializing via base.py's _d function https://github.com/angr/claripy/blob/51d17532025b87fd7d6eafa6ca87fa23df4b800d/claripy/ast/base.py#L58 this bypasses the leaf cache and incorrectly stores the leaves in the non-leaf cache.
  8. Serialization is incomplete; it looses information such as +0.0 vs -0.0; thus deserialization may be incorrect.
  9. (Unknown if applicable) If _d always takes an integer (or str) as a hash, deserialization will lead to ASTs with corrupt _hashs because of point 1.

Before merging:

  • [ ] Add fastpath for annotation-free children (this was removed temporarily to get hashing work first)

Linked: https://github.com/angr/binaries/pull/101

zwimer avatar Oct 06 '22 02:10 zwimer

Unit Test Results

     94 files  +     84       94 suites  +84   1h 49m 17s :stopwatch: + 1h 48m 49s 1 436 tests +1 130  1 342 :heavy_check_mark: +1 096  90 :zzz: +30  0 :x: ±0  4 :fire: +4  1 442 runs  +1 136  1 348 :heavy_check_mark: +1 102  90 :zzz: +30  0 :x: ±0  4 :fire: +4 

For more details on these errors, see this check.

Results for commit 17f91514. ± Comparison against base commit adbf300b.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Oct 06 '22 09:10 github-actions[bot]

Ping @ltfish Regenerate all caches please

zwimer avatar Oct 25 '22 23:10 zwimer

The reason why you couldn't regenerate caches locally: https://github.com/angr/claripy/blob/fix/hash-fail/claripy/ast/base.py#L250

Can you fix it (by removing the breakpoint() call) and then try generating caches across all projects again?

ltfish avatar Oct 27 '22 19:10 ltfish

ROP caches updated. You will want to rebase this branch on master to re-trigger the CI.

ltfish avatar Jan 02 '23 08:01 ltfish

@ltfish The same issues I was seeing before still seem to occur in the CI.

zwimer avatar Jan 09 '23 22:01 zwimer

@zwimer what issue?

ltfish avatar Jan 09 '23 22:01 ltfish

The four decompiler test cases fail because the fix/hash-fail branch in angr/binaries misses some new binaries that I pushed over the weekend. So they are totally expected.

ltfish avatar Jan 09 '23 22:01 ltfish

You should only pay attention to the "push" tests since "PR" tests do not synchronize repos based on the same branch name.

ltfish avatar Jan 09 '23 22:01 ltfish

@ltfish The push CI is failing too: https://github.com/angr/claripy/actions/runs/3841788720/jobs/6576714321

zwimer avatar Jan 09 '23 23:01 zwimer

@ltfish The push CI is failing too: https://github.com/angr/claripy/actions/runs/3841788720/jobs/6576714321

That's exactly what my reply was about.

ltfish avatar Jan 09 '23 23:01 ltfish