autodial
autodial copied to clipboard
A question about the loss
Why log loss[-sum(log f (yis; xsi ))] is used in multi classification problem instead of cross entropy? For example, I have three categories. When the prediction y of the network is correct, such as equal to (1,0,0) or (0,1,0), the loss is infinite.