XMem
XMem copied to clipboard
Question about memory consolidation methods
This is a result figure from an ablation experiment in your 'XMem' paper. I would like to ask how you conducted the comparison with the Random method. Do you have any hypotheses on why Random performs better than K-means? Additionally, the Random method does not perform significantly better than the Usage-based method. Looking forward to your response.
Hello, thank you for the question!
We do find random selection to be surprisingly effective. One hypothesis is that the more "used" memory locations are also over-represented (e.g., as large objects or background regions) so a random selection is likely to pick them as well. K-means is known to struggle in higher dimensions. Combined with the intuition above, it might under-represent those large clusters by assigning only one centroid and selecting more isolated points.
Thank you for your response, which has given me a new understanding of the memory mechanism.
I would like to ask you one more question: In your comparative experiments on different long memory mechanisms, were the experiments conducted using only long memory, or were the results in the experimental table obtained with sensory memory and work memory also included? I am particularly interested in whether you have ever used only long memory for video tracking and how the results compared. I look forward to your response.
The other types of memory remain in use. Using long-term memory only is probably going to degrade performance by too much.