mark padgham

Results 619 comments of mark padgham

Coincidence: https://twitter.com/bikesRdata/status/1232383846689247232. And yeah, the portability for web assembly is a very strong argument for your way of doing things.

Yeah, that is definitely something I'm aware of, and that would be one workable solution. The big advantage of "proper" text processing is (in this context) really just the stemming,...

I've been pondering the scale of this issue. In short: It needs another package because there's been so much amazing development on [text analysis in R](https://www.tidytextmining.com/tidytext.html) that has ultimately led...

See also ROpenSci's own [`tokenizers` package](https://github.com/ropensci/tokenizers), which uses the [`snowballC` package](https://cran.r-project.org/web/packages/SnowballC/index.html) for the hard work.

@jonocarroll thoughts here please. This code tokenizes the Description texts of all R packages using 3 different packages for the task:: ``` db

Yeah, i agree, and actually realised that `flipper` could simply pre-store/cache the tokenized versions of package descriptions anyway, entirely avoiding the speed issue. I'll sketch out a `tokenizers` solution here...

yeah, that's a solid idea, especially for the ease of a `print` method

The only extra info that is directly available via GH API v4 that is likely useful here would be [`labels`](https://developer.github.com/v4/object/labelconnection/), but it would of course also be easy to trawl...

oh no, whoops, the `labels` i flagged are actually just the issue labels. What i meant was of course and indeed [`repositoryTopics`](https://developer.github.com/v4/object/repositorytopic/).

Thanks for checking all that out - that naming as `GITHUB_GRAPHQL_TOKEN` was just copied directly from [`ghrecipes`](https://github.com/ropenscilabs/ghrecipes/blob/master/R/zzz.R#L24). I'm not (yet) sure of the details of what is necessary in terms...