Chen Qian

Results 69 comments of Chen Qian

@andreped Thanks for reporting the issue! Currently users would need to write their own custom training loop to handle the gradient accumulation, which is not too hard, so we have...

Thanks all for the great discussion! @andreped Thanks for raising the BN issue, yes, it's something we should support. Actually I am curious about the performanceloss if we don't handle...

@adriangb Thanks for reporting the issue! There has not been any change on `get_weights()` for months. For loading optimizer weights, please make sure you call `load_weights()` if you want to...

@mattdangerw Definitely! Will add in the next commit.

@mattdangerw Do we still have unresolved issues on functionality of this implementation? I played around with it a bit more and the functionality looks correct to me (compared with RoBERTa's...

@mattdangerw Simply wrapping by `py_function` has tons of runtime errors, the alternative to this PR is not supporting `tf.data` pipeline, e.g., [HuggingFace Roberta TF model](https://huggingface.co/roberta-base), which actually won't cause performance...

@abheesht17 Thanks for raising it! It's actually fine, because `#version: 0.2` won't be performed as a merge rule, because it requires to see a token `#version:0.2`, but it should be...

Actually giving it a second thought - we may not want to trigger network_tests at every request? Currently the network_tests are still small, but in the future it could be...

> Each classifier_checkpoints entry will need to specify num_classes. Just to clarify - do we want to have this information in the saved file, or a config? My preference is...

@lgeiger Thanks for reporting the issue! Could you try moving the `model.compile()` under strategy scope and rerun the tests in your setup? Also is it only failing with SGD or...