Daniel Lemire

Results 293 issues of Daniel Lemire

When deserializing a bitmap, it is possible that the result might be invalid. This could happen because there was data corruption. The deserialization could still generate a bitmap without failure,...

help wanted
good first issue

As far as I can tell, CPM has currently 'git' as a hard dependency but it does not warn the user if git is missing. Currently, if git is missing,...

The following blog post could be useful if we ever want to parsing `uint8_t` values: https://lemire.me/blog/2023/11/28/parsing-8-bit-integers-quickly/

The library is already correct on these cases, but we should be explicit. Contributed by @deadalnix ``` 1/ Where float stop rounding to 0. 0.7006492321624085354e-45 0.7006492321624085355e-45 2/ Where float start...

The code in src/util/simdutf8check.h is suboptimal. It could be replaced either by... 1. [simdutf](https://github.com/simdutf/simdutf) which provides full support for Unicode (transcoding, validation, and so forth). 2. [is_utf8](https://github.com/simdutf/is_utf8) which provides just...

type/enhancement
no-issue-activity

We only change the bulk of the processing, leaving the tail intact. It is currently no faster. GCC 12, Icelake. Main branch: ``` convert_latin1_to_utf8+icelake, input size: 199331, iterations: 30000, dataset:...

Currently, we are using simple UTF-8 to Latin 1 routines, but as remarked by @aqrit and others, they could be, should be, better @aqrit suggest something like this... https://gist.github.com/aqrit/ebcbd13a43ac4ee4ef05578074ad3631

This has a marginal effect on the binary size, but it saves a few kilobytes so it might be worth it. On the negative size, it makes the library we...

https://en.wikipedia.org/wiki/Punycode

For speed, we should have a NEON version of streamvbyte_compressedbytes, see https://github.com/lemire/streamvbyte/pull/57/files

help wanted