NodeFormer
NodeFormer copied to clipboard
Clarification on the Role of 'K' in kernelized_gumbel_softmax Function
- I have a question regarding the 'kernelized_gumbel_softmax' function. Specifically, I am curious about the role of 'K' in this context. Does 'K' serve a purpose similar to the number of heads in multihead attention?