Daniël de Kok
Daniël de Kok
libtorch 2.0.0 was not yet supported until I merged the 2.0.0 support PR a few minutes ago. The issue is that the Torch C++ API (which the binding uses) is...
> If it's ok, I'd like to tack on another question: could you provide a minimal example of how to call syntaxdot from the commandline? I am not quite sure...
> i realzed that it is because of the too strict version of blis in your requirements. `blis` is intentionally pinned to an older version because `blis` 0.9.x has known...
This still needs to be rebased and reviewed, but this should be fixed with PR #2097 if anyone wants to try.
This should be fixed by #2080 (#2097 became part of that PR).
I would separate string distance/sequence alignment from tokenization. They are useful for a lot of other things, e.g. we use them for restoring case in named entities after lemmatization. It...
Rust binding for the Alpino tokenizer (for Dutch): https://crates.io/crates/alpino-tokenizer
I have also created bindings for the `sentencepiece` unsupervised tokenizer: https://crates.io/crates/sentencepiece Still have to bind the training parts. But currently, it allows one to load up a sentencepiece model and...
Indeed looks like a solid option that supports a lot of languages. Note though that the Cavnar & Trenkle has been succeeded by more modern and accurate techniques, see e.g.:...
[sticker](https://github.com/danieldk/sticker) is a neural sequence labeler (bidi RNNs or dilated convolution networks) which could do named entity recognition. One of our students has done some experiments on the effect of...