Mark van der Loo

Results 84 comments of Mark van der Loo

How is the chisq distance defined?

Confirmed, that seems to be a bug. Seems to be independent of the chosen distance. **Edit.** A bit confusing because I have worked with stringdist on millions of records before....

The binaries for that R version are not on CRAN anymore. Your best option is to upgrade to the latest R version. Of that is not possible you need to...

Hi there, for now I don't have the bandwidth to work on this. Realistically, I can possibly look into this somewhere next year. That also includes PRs because those also...

Hi dan, Interesting idea. I did not hink about that yet. A few things that come to mind while I commute: leave out the components with small distances (usually relatively...

I think that Jan's implementation does exactly that: save space. Since `stringdist` offloads everything to paralellized `C` code quickly, I agree that it is not likely that we get much...

Thanks, this relates somewhat to #48. In the formal definition of the qgrams[1] distance, we compare two qgram-vectors, where the length of the vectors is equal to the number of...

You can tokenize.using one of the many tokenizers available in R, then hash the tokens (words) to integer using the hashr package and.then use stringdist::seq_dist. It's basically why I wrote...

Nope, I haven't considered it yet, but I think it should be quite easy to implement by users on top of the existing functionality. Just compute the q-gram (or jaccard...

Ran into this as well. `libprotoc-dev` is in the DESCRIPTION of [RProtoBuf](https://cran.r-project.org/web/packages/RProtoBuf/index.html) but not picked up by the [rule for protobuf-compiler](https://github.com/rstudio/r-system-requirements/blob/master/rules/protobuf-compiler.json).