retdec icon indicating copy to clipboard operation
retdec copied to clipboard

Detect idiom: deserializing 32 and 64-bit integers

Open KOLANICH opened this issue 3 years ago • 2 comments

NumericT<32> int32FromLEOffset(NumericT<8> * start){
	return int32From2Int16(int16FromLEOffset(start), int16FromLEOffset(start + 2));
}

and generally

NumericT<N> intNLEFromOffset(NumericT<8> * start){
	return intNFrom2HalfIntNs<N>(intNLEFromOffset<N/2>(start), intNLEFromOffset<N/2>(start + N/2/8));
}

Analogiously for big endian.

KOLANICH avatar Sep 19 '21 21:09 KOLANICH

https://github.com/avast/retdec/issues/1022#issuecomment-932218952

PeterMatula avatar Oct 01 '21 13:10 PeterMatula

Is this some kind of compiler idiom?

No, it is explicitly coded in source code. But most real world impls have a function/method for this idiom.

Do you have some input on where it occurs?

Code parsing file formats and network packets from memory-buffers, including memory-mapped files. When an int is read from a file using a stream-based interface, it is usually read into memory, and then parsed from it (alternatively it can be read as it is, but usually people use an impl reading and then combining bytes into an int, and very often compiler fails to optimize it out when endianness of ints in a buffer and machine endianness match).

What do we produce at the moment?

Sometimes bit operations, sometimes integer arithmetics with the same effect. Very often - mixed. Also pointer arithmetics.

What do you suggest we should produce? (Do I guess right that you propose a function call?)

Yeah. Add an (possibly inline) function and use it every time the idiom occurs. We cannot just int *a=(int *)(ptr + offset);, because it is not portable across machines of different endiannesses.

KOLANICH avatar Oct 01 '21 16:10 KOLANICH