Chris Ha
Chris Ha
@Narsil this seems to be the most recent and updated effort to parallelize the unigram trainer. What kind of development or testing is required to move this forward? I would...
I prepared a commit that simply removed the 'data lifetime to fix the clippy warning. I could not push due to insufficient permissions. The fix builds and passes all tests.
I think current fix requirements are minor enough that anybody inlcuding you @mishig25 or @Narsil could easily fix so would not merit a whole new PR at this time. (as...
personally, I cannot see why it shouldn't be. As far as I can tell there has not been any material change in or around unigram model(unigram/model.rs), unigram trainer(unigram/trainer.rs) or parallelism...
The following zip file contains tokenizer trained on big.txt on main, tokenizer trained from this PR and their diffs. It seems like the learned vocabulary is the same while the...
I've done some testing on this and it seems to have a unmeasurable impact on speed. In other words, including or excluding byteswap have statistically insignificant differences. I tested using...
I will match @echelon's offer. Also 100$ partial bounties for : 1. bringing opus-dev recent upto master 2. any complete SILK implementation 3. any complete CELT implementation 4. if 1...
I think i might start with 1 myself
@JSDurand thanks for mentioning [opus-native](https://github.com/hasenbanck/opus-native). But it seems that not only do the tests do not pass, they don't even compile atm.
I think the only fair and reasonable approach would be to defer to the maintainer atm.