Recurrent-Independent-Mechanisms icon indicating copy to clipboard operation
Recurrent-Independent-Mechanisms copied to clipboard

Update RIM.py

Open eugenelet opened this issue 5 years ago • 2 comments

This is based on the description given in the paper: "Based on the softmax values in (4), we select the top k_A RIMs (out of the total K RIMs) to be activated for each step, which have the least attention on the null input,....."

Hence we have to perform softmax prior to the construction of the mask.

eugenelet avatar Mar 19 '20 16:03 eugenelet

Thanks for the PR and for reading the code in detail. Since we only need the indices of the top k_A RIMs for constructing the mask, even if we don't take softmax before computing the mask it will be fine as the indices would be the same as after softmax. Softmax converts a given vector into a probability distribution and the highest value in the vector before softmax will have the highest probability after softmax. Let me know if you still find anything wrong with this or any other discrepancies.

dido1998 avatar Mar 19 '20 21:03 dido1998

Thanks for going through my PR. Here's my understanding: The softmax scales to final dimension (input and null) into a probability (local). The search of the top k_A is based on the scaled "inputs". Without going through the softmax first, there might be a difference in magnitude across RIMs (global). Let me know if my understanding is correct.

eugenelet avatar Mar 20 '20 00:03 eugenelet