Chris Ha
Chris Ha
## ❓ Questions and Help #### What is your question? What is the license for the tokenizer model used in NLLB(SPM-200)? The NLLB model itself is cc-by-nc-4.0, but it is...
Some small fixes regarding formatting
I'm looking into the codebase in trying to implement #280 And while reading the code i also applied some lints on the code to understand it better. The commits are...
I am interested in [multiset](https://en.wikipedia.org/wiki/Multiset) collections such as [Python's `Counter`](https://docs.python.org/3/library/collections.html#collections.Counter) or Rust's [`Counter`](https://crates.io/crates/counter). I see a few possible pathways 1. Implement it on my own in the downstream project I...
changing this order will make it much faster especially for larger arrays
https://github.com/ekzhu/datasketch/blob/ebe4ca4a5ddf5763df8ea80a9b6851a6044b1fd0/datasketch/minhash.py#L12 in this implementation of minhash, it seems like the hasher is using 32 bits (`sha1_hash32`) why is the `_max_hash = np.uint64((1
Currently, the default `cargo clippy` shows a lot of potential fixes. I've prepared a potential initial batch that could be applied. There are more, and if this PR merges I...
I'm not sure if this is the right place to ask this, (feel free to direct me where) But would it be possible to also produce WET files from this...
`wyrng` has been shown to have some flaws within the randomness (from [official repo](https://github.com/wangyi-fudan/wyhash) ) >Both of them are not 64 bit collision resistant, but is about 62 bits (flyingmutant/Cyan4973/vigna)...
I wanted to compare the more upto date xxhash-rust (xxh3) with wyhash it seems smaller hashes wyhash is faster and larger hashes wyhash is faster test wyhash_bench::hash_004_bytes ... bench: 2...