zstd
zstd copied to clipboard
Recursive inline code failing to build with WindRiver 6.9 gnu compiler 4.3.3
When I compile the ZSTD code under a VxWorks 6.9 environment, using the gnu compiler version 4.3.3 I see the following error reported when it attempts to compile the zstd_lazy.c file.
ZSTD/compress/zstd_lazy.c: In function 'ZSTD_RowFindBestMatch_dedicatedDictSearch_6_6':
ZSTD/compress/zstd_lazy.c:988: sorry, unimplemented: inlining failed in call to 'ZSTD_row_getSSEMask': recursive inlining
ZSTD/compress/zstd_lazy.c:1018: sorry, unimplemented: called from here
C:\WindRiver\WindRiver_6_9\utilities-1.0\x86-win32\bin\make.exe: *** [build/zstd_lazy.o] Error 1
Exiting.
I was previously compiling the exact same zstd source code under VxWorks 6.8 environment, using the gnu compiler version 4.1.2 without any issues. So on the face of it, it looks to me like the compiler's handling of the inline code has changed between 4.1.2 and 4.3.3
As a quick/easy work-around I have modfied the actual code in zstd_lazy.c to prevent this function getting "inlined" as follows:
/* FORCE_INLINE_TEMPLATE */ ZSTD_VecMask // <----- Comment out the inline qualifier
ZSTD_row_getSSEMask(int nbChunks, const BYTE* const src, const BYTE tag, const U32 head)
{
const __m128i comparisonMask = _mm_set1_epi8((char)tag);
int matches[4] = {0};
int i;
assert(nbChunks == 1 || nbChunks == 2 || nbChunks == 4);
for (i=0; i<nbChunks; i++) {
const __m128i chunk = _mm_loadu_si128((const __m128i*)(const void*)(src + 16*i));
const __m128i equalMask = _mm_cmpeq_epi8(chunk, comparisonMask);
matches[i] = _mm_movemask_epi8(equalMask);
}
if (nbChunks == 1) return ZSTD_rotateRight_U16((U16)matches[0], head);
if (nbChunks == 2) return ZSTD_rotateRight_U32((U32)matches[1] << 16 | (U32)matches[0], head);
assert(nbChunks == 4);
return ZSTD_rotateRight_U64((U64)matches[3] << 48 | (U64)matches[2] << 32 | (U64)matches[1] << 16 | (U64)matches[0], head);
}
Would like to ask if there is some bug in the ZSTD code itself - related to how it defines the inline qualifier? Or could it be that it needs a particular optimisation flag (or set of flags) specified on the command line to gcc during compilation? I am currently using flag -O2