LightningScanner
LightningScanner copied to clipboard
Non scalar scanners overrun buffer
The AVX2 scanner reads 32bytes at once, so as chunk approaches the end of size, it ends up reading past the end of the buffer
https://github.com/localcc/LightningScanner/blob/76e59b68c495f31b46438841553c5ae0bcdbfab3/src/backends/Avx2.cpp#L15-L17
The SSE4.2 scanner also has the same issue. https://github.com/localcc/LightningScanner/blob/76e59b68c495f31b46438841553c5ae0bcdbfab3/src/backends/Sse42.cpp#L15-L17
This can cause crashes if there is no readable memory past the end of the buffer.
Do you have a fix for it?
I just subtracted the size by 31, which isn't really a fix but more of a workaround.
@BlueAmulet How about this it works for me let me know if it works for you too.
ScanResult FindAvx2(const Pattern& patternData, void* startAddr, size_t size) {
constexpr size_t UNIT_SIZE = 32;
size_t processedSize = 0;
__m256i pattern = _mm256_load_si256((__m256i*)patternData.data.data());
__m256i mask = _mm256_load_si256((__m256i*)patternData.mask.data());
__m256i allZeros = _mm256_set1_epi8(0x00);
size_t chunk = 0;
for (; chunk + UNIT_SIZE <= size; chunk += UNIT_SIZE) {
__m256i chunkData = _mm256_loadu_si256((__m256i*)((char*)startAddr + chunk));
__m256i blend = _mm256_blendv_epi8(allZeros, chunkData, mask);
__m256i eq = _mm256_cmpeq_epi8(pattern, blend);
if (_mm256_movemask_epi8(eq) == 0xffffffff) {
processedSize += UNIT_SIZE;
if (processedSize < patternData.unpaddedSize) {
pattern = _mm256_load_si256((__m256i*)(patternData.data.data() + processedSize));
mask = _mm256_load_si256((__m256i*)(patternData.mask.data() + processedSize));
} else {
char* matchAddr = (char*)startAddr + chunk - processedSize + UNIT_SIZE;
return ScanResult((void*)matchAddr);
}
} else {
pattern = _mm256_load_si256((__m256i*)patternData.data.data());
mask = _mm256_load_si256((__m256i*)patternData.mask.data());
processedSize = 0;
}
}
if (chunk < size) {
size_t remainingBytes = size - chunk;
__m256i chunkData = _mm256_loadu_si256((__m256i*)((char*)startAddr + chunk));
__m256i remainingMask = _mm256_set1_epi8(0x00);
for (size_t i = 0; i < remainingBytes; ++i) {
((char*)&remainingMask)[i] = 0xFF;
}
__m256i blend = _mm256_blendv_epi8(allZeros, chunkData, remainingMask);
__m256i eq = _mm256_cmpeq_epi8(pattern, blend);
if (_mm256_movemask_epi8(eq) == 0xffffffff) {
char* matchAddr = (char*)startAddr + chunk;
return ScanResult((void*)matchAddr);
}
}
return ScanResult(nullptr);
}
Fix for this as well as performance is planned, but I am a bit busy lately, will try to get it out soon
@localcc would be great, My solution did not work.