Daniel Lemire
Daniel Lemire
When deserializing a bitmap, it is possible that the result might be invalid. This could happen because there was data corruption. The deserialization could still generate a bitmap without failure,...
As far as I can tell, CPM has currently 'git' as a hard dependency but it does not warn the user if git is missing. Currently, if git is missing,...
The following blog post could be useful if we ever want to parsing `uint8_t` values: https://lemire.me/blog/2023/11/28/parsing-8-bit-integers-quickly/
The library is already correct on these cases, but we should be explicit. Contributed by @deadalnix ``` 1/ Where float stop rounding to 0. 0.7006492321624085354e-45 0.7006492321624085355e-45 2/ Where float start...
The code in src/util/simdutf8check.h is suboptimal. It could be replaced either by... 1. [simdutf](https://github.com/simdutf/simdutf) which provides full support for Unicode (transcoding, validation, and so forth). 2. [is_utf8](https://github.com/simdutf/is_utf8) which provides just...
We only change the bulk of the processing, leaving the tail intact. It is currently no faster. GCC 12, Icelake. Main branch: ``` convert_latin1_to_utf8+icelake, input size: 199331, iterations: 30000, dataset:...
Currently, we are using simple UTF-8 to Latin 1 routines, but as remarked by @aqrit and others, they could be, should be, better @aqrit suggest something like this... https://gist.github.com/aqrit/ebcbd13a43ac4ee4ef05578074ad3631
This has a marginal effect on the binary size, but it saves a few kilobytes so it might be worth it. On the negative size, it makes the library we...
https://en.wikipedia.org/wiki/Punycode
For speed, we should have a NEON version of streamvbyte_compressedbytes, see https://github.com/lemire/streamvbyte/pull/57/files