csFastFloat icon indicating copy to clipboard operation
csFastFloat copied to clipboard

Use simpler Intrinsic API to detect if there is ASCII byte or not

Open kunalspathak opened this issue 4 years ago • 4 comments

The code below can be simplified using SSE2.MoveMask() if SSE2 is available.

https://github.com/CarlVerret/csFastFloat/blob/76d6a2e76a6d3c1f90a7f4b9747661979158e7e4/csFastFloat/Utils/Utils.cs#L283-L292

  if (Sse2.IsSupported)
  {
      currentSseMask = (uint)Sse2.MoveMask(Sse2.LoadVector128(pBuffer)); // unaligned load
      if (currentSseMask != 0)
      {
          Console.WriteLine("Non-Ascii byte present");
      }
  }

kunalspathak avatar Nov 10 '21 16:11 kunalspathak

The code in place checks that the UTF-16 words represent digits.

Your proposed solution checks for the presence of byte values with the high-bit set. Are you assuming that the input is in UTF-8 format? I would be surprised if your solution worked to as an ASCII-check given UTF-16. Even so, we do not want to check for ASCII, we want to check for digits.

Or am I misunderstanding?

lemire avatar Nov 10 '21 17:11 lemire

The code in place checks that the UTF-16 words represent digits.

OK. The code I had was for UTF-8 and yes, it is just to determine if bytes are valid ASCII or not. There are some intrinsics available (need to double check which ones) to do it faster for SSE2, but SSE4.1 might be limited to do that task and what you have might be the best approach. Have you tried some trick around the logic you mentioned in https://lemire.me/blog/2018/09/30/quickly-identifying-a-sequence-of-digits-in-a-string-of-characters/?

kunalspathak avatar Nov 10 '21 17:11 kunalspathak

Can you elaborate?

lemire avatar Nov 10 '21 17:11 lemire

I will come up with a code when I get chance.

kunalspathak avatar Nov 10 '21 17:11 kunalspathak