Chen Qian comments

Results 69 comments of


Chen Qian

Gradient accumulation support?

@andreped Thanks for reporting the issue! Currently users would need to write their own custom training loop to handle the gradient accumulation, which is not too hard, so we have...

Gradient accumulation support?

Thanks all for the great discussion! @andreped Thanks for raising the BN issue, yes, it's something we should support. Actually I am curious about the performanceloss if we don't handle...

Unexpected breaking change: Optimizer.get_weights() removed

@adriangb Thanks for reporting the issue! There has not been any change on `get_weights()` for months. For loading optimizer weights, please make sure you call `load_weights()` if you want to...

BPE tokenizer

@mattdangerw Definitely! Will add in the next commit.

BytePair Tokenizer Implementation

@mattdangerw Do we still have unresolved issues on functionality of this implementation? I played around with it a bit more and the functionality looks correct to me (compared with RoBERTa's...

BytePair Tokenizer Implementation

@mattdangerw Simply wrapping by `py_function` has tons of runtime errors, the alternative to this PR is not supporting `tf.data` pipeline, e.g., [HuggingFace Roberta TF model](https://huggingface.co/roberta-base), which actually won't cause performance...

Chen Qian

Gradient accumulation support?

Gradient accumulation support?

Unexpected breaking change: Optimizer.get_weights() removed

BPE tokenizer

BytePair Tokenizer Implementation

BytePair Tokenizer Implementation

BytePair Tokenizer Implementation

Try running network_tests on our GCP CI

Add checkpoints for fine-tuned models

New optimizers are incompatible with `jit_compile` and `MirroredStrategy`