keras-nlp issues

BytePair Tokenizer Implementation

7

An implementation of OpenAI's BytePair encoder in TF compatible graph mode, which would allow for the e2e development of certain pretrained models that use this tokenizer (RoBERTa, GPT etc.). Currently...

jessechancy

Try running network_tests on our GCP CI

4

mattdangerw

Add a helper model to automatically apply preprocessing

2

mattdangerw

Add GPT-2 Checkpoints

8

abheesht17

ByteTokenizer doesn't accept floats in parameters

2

**Describe the bug** ByteTokenizer only takes integers for some parameters. Which is fine. But it doesn't fail until runtime - which makes it seem like the input is the issue...

grofte

bug

Add checkpoints for fine-tuned models

2

We have numerous checkpoints for text encoders but there's a lot of value in offering ready-to-go fine tuned models as well. Thoughts: * Let's start with fine-tuning on SST using...

jbischof

Add string defaults for Bert backbone classes

2

In #361 we developed unique string ids and used them for intelligent defaults in `BertClassifier`. We should add defaults to the `BertBase`, `BertLarge`, etc classes as well. * This should...

jbischof

Make Changes to `RobertaCustom` Layer

2

@mattdangerw, @chenmoneygithub - The original RoBERTa implementation has four different dropout variables: https://github.com/facebookresearch/fairseq/blob/main/fairseq/models/roberta/model.py#L634-L637. Our `RobertaCustom` layer, on the other hand, has only one: https://github.com/keras-team/keras-nlp/blob/master/keras_nlp/models/roberta.py#L86. In order to incorporate all four...

abheesht17

Make Changes to `MultiSegmentPacker` Layer for RoBERTa

3

@mattdangerw, @chenmoneygithub - For the `MultiSegmentPacker` layer, we need to make one change. Currently, this is the output of the layer: ``` seq1seq2... ``` But `` is not always used...

abheesht17

Specify vocabulary to BertPreprocessing using a checkpoint name

Currently the vocabulary is specified directly to BertPreprocessor, for example `'uncased_en'`. However, going forward we want to request the vocabulary associated with a checkpoint rather than knowing what vocab the...

jbischof

keras-nlp
keras-nlp copied to clipboard

Metadata

BytePair Tokenizer Implementation

Try running network_tests on our GCP CI

Add a helper model to automatically apply preprocessing

Add GPT-2 Checkpoints

ByteTokenizer doesn't accept floats in parameters

Add checkpoints for fine-tuned models

Add string defaults for Bert backbone classes

Make Changes to `RobertaCustom` Layer

Make Changes to `MultiSegmentPacker` Layer for RoBERTa

Specify vocabulary to BertPreprocessing using a checkpoint name

← Metadata

Owner

Metadata

keras-nlp keras-nlp copied to clipboard

Metadata

← Metadata

Owner

Metadata

keras-nlp
keras-nlp copied to clipboard