C5 icon indicating copy to clipboard operation
C5 copied to clipboard

Many detaild questions

Open shuwei666 opened this issue 1 year ago • 1 comments

Thanks for your great work! it has indeed sparked a lot of inspiration for me. However, there are several aspects that I would like to discuss further:

The paper mentioned: "To allow the network to reason about the set of additional input images in a way that is insensitive to their ordering, we adopt the permutation invariant pooling approach of Aittala et al."

1. Could you elaborate on why insensitivity to ordering is crucial? Specifically, I'm curious whether a sufficiently large training dataset would inherently cover all potential orderings.

Regarding the number of additional unlabeld images (m), it appears that were used in both the training and testing stages. From the ablation study, it seems that various values of m were only tested on the test camera, as illustrated in Table 4. I have a question about this:

2. During the training process, did you experiment with varying quantities for 'm', or was there a consistent fixed number applied throughout, for example, 8?

When m equals 1, I understand that this means only the query image is used during testing. If so, my question is:

4. Could you clarify whether m=1 only signifies the zero-shot condition, i.e., just inferring, or does it mean that the single query image is used for self-calibration, followed by parameter fixation, and then inference?

5. From the results shown in Table 4, it doesn't seem that the results improve as m increases(i.e., error(m=13)>error(m=7)). Could you provide some insights into this?

6. Have you considered using additional labeled images for fine-tuning? If so, would this lead to better results than the current method?

Thank you for taking the time to answer these questions. Your responses will be greatly beneficial to my understanding.

shuwei666 avatar Jun 08 '23 11:06 shuwei666