Robert Clausecker

Results 67 comments of Robert Clausecker

The ideal block size can be determined by calling `stat` on the file descriptor and then checking the `st_blksize` field.

I suspect a number of things. Probably there is double buffering going on. Also, as you read the whole file without checking its length first, the code likely employs an...

Sure, I'm currently writing on the paper. Will get to it tomorrow perhaps. The idea is to use simple UNIX I/O; `read`, `write`, `open`, `close`. Use a fixed-size buffer of...

Nice code! You might be able to use `_mm512_alignr_epi32` to align the high and low surrogates. This needs to be done twice on the 32 bit extended output and saves...

> I am not sure I follow why is that. If there is a lone low surrogate at the start then V = (H ^ (L

> So, I think the situation you described (high surrogate in position 31 followed by low surrogate in position 32) is actually fine because of the conditional code I have...

@NicolasJiaxin You'll have to benchmark it. I can't say for sure, but as these are two different branches, it doesn't seem like it would be a problem.

We've talked about this before. It would be interesting to have a transcoder for the general case “single byte ASCII based encoding.” I can try to do that once I'm...

The auto detection seems to be broken. It detects the Icelake system as as a Haswell and the speed measured is suspiciously low.

@lemire I tested the WojciechMula-avx512bw-utf8-to-utf16 branch giving speeds between 1 GB/s and 8 GB/s (for ASCII). Maybe that was not the right one?