Ash Vardanian

Results 72 issues of Ash Vardanian

Following the [discussion](https://github.com/ashvardanian/StringZilla/issues/137#issuecomment-2062228429) in #137, it would be great to reach some uniformity in feature detection on x86 and Arm. On the latter, we can't yet use SVE, and only...

help wanted
good first issue

StringZilla currently passes ASAN check, unit tests, and fuzzy tests across several programming languages for MacOS, Linux, and Windows on every PR. This, however, is not the same as taking...

good first issue
cpp

## Features - [ ] Better hashing algorithms - [ ] Automata-based fuzzy searching algorithms ## Breaking naming and organizational changes - [ ] Rename `edit_distance` to `levenshtein_distance` to match...

In `python/lib.c` several classes store a combination of a pointer and length. It's worth refactoring the file to use the `sz_string_view_t` structure.

enhancement
good first issue
python

In C++ we have special smart iterators for bulk search and split operations. They lazily report the matches, avoiding heap allocations for the array of match offsets. For that, an...

good first issue
rust

In StringZilla a 64-bit rolling hash function is reused for both string hashes and substring hashes, Rabin-style fingerprints. Rolling hashes take the same amount of time to compute hashes with...

help wanted
performance
core
huge

Python strings offer a lot of powerful methods, such as: - `isalnum`, `isalpha`, `isascii`, `isdecimal`, `isdigit`, `isspace`, `islower`, `isupper`, `istitle`, `isnumeric` for checks. - `lower` and `upper` that copy the...

huge

The 4c738ea446cb8f9041077ac6364557527e8fc427 commit introduces a prototype for StringZilla-based Command-Line toolkit, including the `split` utility replacement. The original prototype suggests a **4x performance improvement** opportunity, but it can't currently handle multiple...

good first issue
python

The upcoming implementation contains only a serial variant. Once the demand for a more efficient implementation grows, one can add a SWAR accelerated implementation. Draft ```c SZ_PUBLIC sz_size_t sz_hamming_distance_utf8( //...

good first issue
core

The C++ class implements a `replace_all` operation, that can be used to replace all occurrences of a substring or a character with a different string. The implementation is designed to...

help wanted
good first issue
cpp