retdec
retdec copied to clipboard
Detect idiom: deserializing 32 and 64-bit integers
NumericT<32> int32FromLEOffset(NumericT<8> * start){
return int32From2Int16(int16FromLEOffset(start), int16FromLEOffset(start + 2));
}
and generally
NumericT<N> intNLEFromOffset(NumericT<8> * start){
return intNFrom2HalfIntNs<N>(intNLEFromOffset<N/2>(start), intNLEFromOffset<N/2>(start + N/2/8));
}
Analogiously for big endian.
https://github.com/avast/retdec/issues/1022#issuecomment-932218952
Is this some kind of compiler idiom?
No, it is explicitly coded in source code. But most real world impls have a function/method for this idiom.
Do you have some input on where it occurs?
Code parsing file formats and network packets from memory-buffers, including memory-mapped files. When an int is read from a file using a stream-based interface, it is usually read into memory, and then parsed from it (alternatively it can be read as it is, but usually people use an impl reading and then combining bytes into an int, and very often compiler fails to optimize it out when endianness of ints in a buffer and machine endianness match).
What do we produce at the moment?
Sometimes bit operations, sometimes integer arithmetics with the same effect. Very often - mixed. Also pointer arithmetics.
What do you suggest we should produce? (Do I guess right that you propose a function call?)
Yeah. Add an (possibly inline) function and use it every time the idiom occurs. We cannot just int *a=(int *)(ptr + offset);
, because it is not portable across machines of different endiannesses.