Step 6 of Algorithm 1 code clarification

Open johmedina opened this issue 2 months ago • 0 comments

The implementation applies a second torch.topk when computing m_i^(n), while Algorithm 1 in the paper defines m_i^(n) over all i_k which is the top-k from the final layer. Could you please clarify if this is intentional or an oversight?

layer_dot_results = F.cosine_similarity(candidate_gradients_expanded, layer_divergence_expanded, dim=2) layer_topk_values, layer_topk_indices = torch.topk(layer_dot_results, evolution_scale) layer_topk_topk_indices = topk_indices[layer_topk_indices]

Nov 06 '25 10:11 johmedina