csFastFloat
csFastFloat copied to clipboard
Use simpler Intrinsic API to detect if there is ASCII byte or not
The code below can be simplified using SSE2.MoveMask() if SSE2 is available.
https://github.com/CarlVerret/csFastFloat/blob/76d6a2e76a6d3c1f90a7f4b9747661979158e7e4/csFastFloat/Utils/Utils.cs#L283-L292
if (Sse2.IsSupported)
{
currentSseMask = (uint)Sse2.MoveMask(Sse2.LoadVector128(pBuffer)); // unaligned load
if (currentSseMask != 0)
{
Console.WriteLine("Non-Ascii byte present");
}
}
The code in place checks that the UTF-16 words represent digits.
Your proposed solution checks for the presence of byte values with the high-bit set. Are you assuming that the input is in UTF-8 format? I would be surprised if your solution worked to as an ASCII-check given UTF-16. Even so, we do not want to check for ASCII, we want to check for digits.
Or am I misunderstanding?
The code in place checks that the UTF-16 words represent digits.
OK. The code I had was for UTF-8 and yes, it is just to determine if bytes are valid ASCII or not. There are some intrinsics available (need to double check which ones) to do it faster for SSE2, but SSE4.1 might be limited to do that task and what you have might be the best approach. Have you tried some trick around the logic you mentioned in https://lemire.me/blog/2018/09/30/quickly-identifying-a-sequence-of-digits-in-a-string-of-characters/?
Can you elaborate?
I will come up with a code when I get chance.