zcain117

Results 4 comments of zcain117

I get the same error when using multiprocess TPU training (also using fairseq transformer model)

This repo mainly passes metrics that the user computes - I don't think there's any way to get examples/sec after the test is over if the user's test code hasn't...

Oh maybe you meant to add support for percentiles for any metric written to tensorboard, not to try to compute examples/sec. That should be doable

`time_to_accuracy` is also available now. A sample config that includes it: https://github.com/GoogleCloudPlatform/ml-testing-accelerators/tree/master/metrics_handler#metric_collection_config Start up time is possible but the user would need to write some event to Tensorboard to indicate...