gigahorse-toolchain icon indicating copy to clipboard operation
gigahorse-toolchain copied to clipboard

Flow analysis + memory modeling

Open snf opened this issue 2 years ago • 4 comments

Hi, first I'd like to thank you for making gigahorse open source. It's an amazing research project and tool. This is not a concrete issue but more of a brain dump with which am looking for ideas.

I've been using gigahorse to write rules to experiment with known bugs and found a limitation at the time of making flow analysis (both using clientlib and symvalic).

To me it looks like the main issue is that the analysis passes don't tend to make a lot of use of the memory modeling module so data flowing through the memory gets "lost". Even after adding relations to improve the memory modeling (specially around MemoryCopyLoop) for the latest solidity versions (0.7, 0.8), the flow analysis tend to be limited to statement uses and defines.

So my question is what do you think it would be a good way to augment the flow analysis passes with memory modeling? I was thinking of adding more more "Array/Tuple/Memory Contents" relations to DataFlows/VarMayBe but I may be missing context on why this is the way it is.

And the other is, have you considered or have thoughts about making a "de-aliasing" pass? I haven't given it too much thought but it looks to me like something as basic as merging aliases with MLOADSFreePtrUnchanged would normalize all the array/tuple access and make the other passes much easier to follow (I'm aware of the symbolic array equivalence but don't think that's used for flow anal). On the other hand, I think, some information could be lost in the process (RawDataArray -> AbiArray, etc) or completely fail if the modeling is inaccurate.

Also, do you hang out in any messaging platform (Discord or other) to discuss gigahorse and evm/solidity?

snf avatar May 22 '22 10:05 snf

Hi Sebastian, and thanks for the kind words. Here are some quick thoughts and others can also comment. The "flow analysis" is not (should not be) the main workhorse for a proper higher-level analysis of contracts. So, I would suggest to do what we also do in our uses of the framework: build analyses that do use the memory modeling, and possibly combine it with flow analysis (or don't--e.g., you may want something more precise, that takes predicates into account).

As for the memory de-aliasing, I'm not sure I follow the suggestion, because I had thought that this is exactly what the "free pointer is unchanged" predicates are used for already. But I haven't looked in a long while, and we'd love to work more on this direction, anyway. Do you have specific examples of contracts/bytecode patterns where you think the decompiler should understand aliases a lot more?

Thanks!

yanniss avatar May 23 '22 07:05 yanniss

Thank you Yannis! One of the simplest examples I can think of is the one to detect when a call uses a static function signature:

CallToConstant(call):-
  Variable_Value(base, _),
  Variable_NumericValue(shift, 0xe0),
  SHL(stmt, shift, base, shiftedVar),
  // vvv Needs to be replaced
  standardflowanalysis.Flows(shiftedVar, actualShifted),
  StatementUsesMemory_ActualMemoryArg(call, _, 0, actualShifted),
  CALLStatement(call, _).

Flows there is precise but ignores data that flows through the memory, so for this particular case it has lots of false negatives.

As you mention, "free pointer is unchanged" is a good way of improving this analysis but I'm confused of why it's not used in the different flow analysis passes.

My idea was to re-implement (or expand) one of the flows passes with memory modeling and pointer aliasing. I see that this was baked in the symvalic analysis to some extent so I'd like your input on if this is even a good idea or if you have a pointer to what would be a good way of implementing it?

snf avatar May 23 '22 21:05 snf

Hi Sebastian,

As Yannis said we haven't added data flows through memory to the main *Flows components, and only done so in specific clients when its needed (to some extend when porting ethainter, the symvalic analysis, etc).

If I were to make a flows component that tracked data flow through memory I'd do it adding more base case Flows(from, to) inferences using information exported by the memory modeling API (with some more documentation here). This has been the motivation behind the memory modeling library: to export all the higher-level inferences a security analysis can need, so that it doesn't have to deal with the low-level intricacies of memory pointers, encodings etc. Do you have a specific reason to want to include such information about pointer aliasing in a data flow analysis?

Thanks, Sifis

sifislag avatar May 26 '22 16:05 sifislag

Thank you Sifis, I ended up hacking together something similar to what's used in ethainter's InfoFlows but added logic to track the flow through the memory (ArrayLoad/ArrayStore) in addition to merging the pointers (MLOADSFreePtrUnchanged) and memory copy loops.

The reason for having the pointer aliasing information in the flow analysis is that I needed to track what's going in the calldata array (to be passed to other functions). I see that solidity does something like

var1 = *freeptr
var1.len = len
var1.data[0:4] = sighash
var2 = *freeptr
var2.data[4:36] = arg1
...
*freeptr += len
var3 = *freeptr
memcpy(var3, var2.data, var2.len)
addr.call(var3, ...)

without tracking those aliases (var1 and var2) and memcpy (from var1/var2 to var3) information gets lost. So far only added these constructs to my poor man flow analysis which isn't precise but helps me track what is going into external calls.

On a side note, I think this information could also augment the decompilation used by contract-library which tends to have lots of MEM[64] without variable bindings.

snf avatar Jun 05 '22 17:06 snf