Something interesting in your paper
📚 Documentation
i noticed the formula between (2) and (3) in your original paper:
log(P_bg) = log(P(bg | O_k = 1)P(O_k = 1) + P(O_k = 0))
the assumption P(bg | O_k = 0) = 1 can be easily understand, but i wonder what's your understanding of $P(bg | O_k = 1)P(O_k = 1)$ in the rvalue of the formula shown above. here's my opinions:
you use the law of full probability to decompose the entire process into 1st proposal stage and 2nd refine & classfication stage. P(O_k = 1) means the probability of foreground proposal prediction, while P(bg | O_k = 1) means given a foreground proposal but actually it belongs to background, the 2nd stage revise it to the background.
is that understanding correct?
/* following that, here's another question: how to model the P(bg | O_k = 1) in you network and how to prove you do not acutally model the P(bg | O_k = 0) instead? is that the assumption P(bg | O_k = 0) = 1 works? */
i noticed something wrong with the question annotated above: the last linear layer (i.e. logistic/softmax regression) actually model the P(C_k | O_k) rather than specific condition O_k = 0 or O_k = 1, so the last question may not exist. backup for more people with the same misunderstanding.
appreciate your reply, thanks!
- Links to the relevant documentation/comment: