fastcomposer
fastcomposer copied to clipboard
question about balanced l1 loss
I have a few qustions about this loss: 1. threshold is no use, why; 2. if two attention maps are the same, loss should be -1, but shouldn't loss be optimized from somthing positive to neat zero?
same question