Johanne Medina

Results 1 issues of Johanne Medina

The implementation applies a second torch.topk when computing m_i^(n), while Algorithm 1 in the paper defines m_i^(n) over all i_k which is the top-k from the final layer. Could you...