Jon Harmon

Results 78 issues of Jon Harmon

Might be Windows-specific. Downloading "bert_large_uncased_wwm" failed (the actual download.file step). Removing `method = "libcurl"` fixed it. I don't remember why we specify the method, need to try on different OSs....

`extract_features`, `download_BERT_checkpoint`, and probably some other functions use the model parameter, with a hard-coded list of models. Investigate listing those models in one place and automatically updating the formals of...

enhancement

As an RBERT user, I'd like the tokenizer to be as fast as it can be, so that I don't have to wait for this step more than is absolutely...

We really need a tiny checkpoint for tests. We currently include the smallest one we can (bert_base_uncased) via git-lfs, but I'd definitely like that to be smaller. Including it allows...

enhancement

This can wait 'til after we require TF2, but... it'd be nice if we ran `tensorflow::tf_version`, and prompted them to install if they don't have the version we require.

enhancement

If the user sends us the same text 100x, we shouldn't take the time to BERT that 100x. Uniquify then join at the end to get back the full list.

The tokenizer for a given model is deterministic (it only depends on the vocab file + whether it's cased). Producing the tokenizer takes 100x as long as loading a pre-processed...

I'm not sure yet if this should be inside RBERT or maybe integrated into [tidymodels/textrecipes](github.com/tidymodels/textrecipes), but we should make it easy to extract features from text in some sort of...

enhancement

Somewhere we should add documentation about the available models. Maybe make a data object of names and urls (and use that in `.get_model_url`), or just include the info in `download_BERT_checkpoint`.

documentation
CRAN