Jonathan Bratt issues

Results 9 issues of


Jonathan Bratt

change package tests to work with tiny BERT checkpoint

currently, we test with BERT_base, but we may as well use the smallest available.

figure out TF warning message

It doesn't complain right away, but if you run enough models, you get a message like: > WARNING:tensorflow:5 out of the last 6 calls to triggered tf.function retracing. Tracing is...

figure out better way to pass token type ids to model

currently, I attach tt_ids as an attribute to the tokenized input in `tokenize_input`. It feels like a misuse of attributes, but I also don't want to, say, pass the tt_ids...

start using assert package for safety checks?

I think we decided this would be a good thing to do.

(TF2) improve functions in functions-to-improve.R

The filename says it all. These were hastily written to get the branch into a working state, and should be refactored with more attention paid to speed and safety.

Decide how to handle tokenization conventions

There are several tokenization conventions (e.g. the token used for padding, separating segments, etc.) that need to be specified when doing the wordpiece tokenization for BERT. Currently, some of these...

Convert to TF2/keras

This takes priority over fixing for TF 1.14.

enhancement

Test/debug code in run_classifier.R

`run_classifier.R` is mostly about fine-tuning BERT with a classifier head. RBERT is not quite working at this level yet. There are almost certainly bugs that would prevent this from working...

bug

CRAN

Load BERT-esque checkpoints in pytorch formats

The original BERT checkpoints released by Google are in a TensorFlow format. It seems that most of the related work done by other teams is in the [PyTorch](https://github.com/huggingface/pytorch-transformers) implementation. In...

enhancement

help wanted