Jon Harmon issues

Results 78 issues of


Jon Harmon

Update README to point to torchtransformers.

DL of large model failed

Might be Windows-specific. Downloading "bert_large_uncased_wwm" failed (the actual download.file step). Removing `method = "libcurl"` fixed it. I don't remember why we specify the method, need to try on different OSs....

Share model List Between Functions

`extract_features`, `download_BERT_checkpoint`, and probably some other functions use the model parameter, with a hard-coded list of models. Investigate listing those models in one place and automatically updating the formals of...

enhancement

Rewrite and Speed Up Tokenizer

As an RBERT user, I'd like the tokenizer to be as fast as it can be, so that I don't have to wait for this step more than is absolutely...

Fake tiny checkpoint

We really need a tiny checkpoint for tests. We currently include the smallest one we can (bert_base_uncased) via git-lfs, but I'd definitely like that to be smaller. Including it allows...

enhancement

Prompt to Install tensorflow

This can wait 'til after we require TF2, but... it'd be nice if we ran `tensorflow::tf_version`, and prompted them to install if they don't have the version we require.

enhancement

uniquify incoming text

If the user sends us the same text 100x, we shouldn't take the time to BERT that 100x. Uniquify then join at the end to get back the full list.

Save tokenizer as part of model

The tokenizer for a given model is deterministic (it only depends on the vocab file + whether it's cased). Producing the tokenizer takes 100x as long as loading a pre-processed...

step_rbert_features

I'm not sure yet if this should be inside RBERT or maybe integrated into [tidymodels/textrecipes](github.com/tidymodels/textrecipes), but we should make it easy to extract features from text in some sort of...

enhancement

Document available models

Somewhere we should add documentation about the available models. Maybe make a data object of names and urls (and use that in `.get_model_url`), or just include the info in `download_BERT_checkpoint`.

documentation

CRAN