Results 91 issues of Sean Moriarity

Adds BEiT image transformer model. The implementation shares some similarities with BERT, so I rebased off of that. There is an additional semantic segmentation model I left out because the...

Similar to: https://github.com/tensorflow/tensorflow/issues/34025

kind:feature
area:nx

cc @jonatanklosko @josevalim Almost all tests passing, have to rebase as well

We should just ignore them, and not just use stop_grad

This allows us to override the hook if necessary. I was searching for a solution to provide Layer metadata in the hook function signature, but I don't think it's possible...

Step counts are a bit off. They should be consistent across batches, e.g.: ``` Epoch: 0, Batch: 350, accuracy: 0.5190082 loss: 0.6815987 Epoch: 1, Batch: 341, accuracy: 0.7035355 loss: 0.6067376...

Related to #453 by grouping these together in the same part of the map we've run into lots of issues with state getting updated when it was not meant to...