Distilling-Object-Detectors
Distilling-Object-Detectors copied to clipboard
about imitation loss weight
Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss?
hi could rephrase your questions?
chongkuiqi [email protected] 于 2021年2月23日周二 10:50写道:
Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/twangnh/Distilling-Object-Detectors/issues/25, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM63H4VWPF72MVX2WDTTAMJXRANCNFSM4YBVAMVA .
hi could rephrase your questions? chongkuiqi [email protected] 于 2021年2月23日周二 10:50写道: … Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#25>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM63H4VWPF72MVX2WDTTAMJXRANCNFSM4YBVAMVA .
就是说在训练初始阶段,我的loss(分类损失+定位损失)大约在1e-2,imitation loss大约为300左右,我是不是应该给imitation loss一个很小的权重(10-4),让imitation loss与loss差不多?或者说这两个损失保持怎样的比例比较好?
for custom data, I suggest you first keep the two-loss at similar level, then tune the imitation loss.
chongkuiqi [email protected] 于 2021年2月23日周二 17:32写道:
hi could rephrase your questions? chongkuiqi [email protected] 于 2021年2月23日周二 10:50写道: … <#m_-9103245680687119030_m_-6708683990822357179_> Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#25 https://github.com/twangnh/Distilling-Object-Detectors/issues/25>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM63H4VWPF72MVX2WDTTAMJXRANCNFSM4YBVAMVA .
就是说在训练初始阶段,我的loss(分类损失+定位损失)大约在1e-2,imitation loss大约为300左右,我是不是应该给imitation loss一个很小的权重(10-4),让imitation loss与loss差不多?或者说这两个损失保持怎样的比例比较好?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/twangnh/Distilling-Object-Detectors/issues/25#issuecomment-784035305, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM3RLZTSMWAXSBD6AHDTANY3HANCNFSM4YBVAMVA .
for custom data, I suggest you first keep the two-loss at similar level, then tune the imitation loss. chongkuiqi [email protected] 于 2021年2月23日周二 17:32写道: … hi could rephrase your questions? chongkuiqi @.*** 于 2021年2月23日周二 10:50写道: … <#m_-9103245680687119030_m_-6708683990822357179_> Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#25 <#25>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM63H4VWPF72MVX2WDTTAMJXRANCNFSM4YBVAMVA . 就是说在训练初始阶段,我的loss(分类损失+定位损失)大约在1e-2,imitation loss大约为300左右,我是不是应该给imitation loss一个很小的权重(10-4),让imitation loss与loss差不多?或者说这两个损失保持怎样的比例比较好? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM3RLZTSMWAXSBD6AHDTANY3HANCNFSM4YBVAMVA .
Thanks! I tried and find it better that the imitation loss is about 6~10 times other loss. More, is the kernel size of the adaptation layer important ? I mean i find you use 3x3 kernel with padding=1, what if using 1x1 kernel ?
we did not examine the choice of adaptation kernel size, you can try tune if on you data.
On Thu, Feb 25, 2021 at 11:24 AM chongkuiqi [email protected] wrote:
for custom data, I suggest you first keep the two-loss at similar level, then tune the imitation loss. chongkuiqi [email protected] 于 2021年2月23日周二 17:32写道: … <#m_5998101465384950157_> hi could rephrase your questions? chongkuiqi @.*** 于 2021年2月23日周二 10:50写道: … <#m_-9103245680687119030_m_-6708683990822357179_> Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#25 https://github.com/twangnh/Distilling-Object-Detectors/issues/25 <#25 https://github.com/twangnh/Distilling-Object-Detectors/issues/25>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM63H4VWPF72MVX2WDTTAMJXRANCNFSM4YBVAMVA . 就是说在训练初始阶段,我的loss(分类损失+定位损失)大约在1e-2,imitation loss大约为300左右,我是不是应该给imitation loss一个很小的权重(10-4),让imitation loss与loss差不多?或者说这两个损失保持怎样的比例比较好? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#25 (comment) https://github.com/twangnh/Distilling-Object-Detectors/issues/25#issuecomment-784035305>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM3RLZTSMWAXSBD6AHDTANY3HANCNFSM4YBVAMVA .
Thanks! I tried and find it better that the imitation loss is about 6~10 times other loss. More, is the kernel size of the adaptation layer important ? I mean i find you use 3x3 kernel with padding=1, what if using 1x1 kernel ?
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/twangnh/Distilling-Object-Detectors/issues/25#issuecomment-785548406, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKMYBOTZGL357NRFGAZTTAW7HNANCNFSM4YBVAMVA .
we did not examine the choice of adaptation kernel size, you can try tune if on you data. … On Thu, Feb 25, 2021 at 11:24 AM chongkuiqi @.> wrote: for custom data, I suggest you first keep the two-loss at similar level, then tune the imitation loss. chongkuiqi @. 于 2021年2月23日周二 17:32写道: … <#m_5998101465384950157_> hi could rephrase your questions? chongkuiqi @.*** 于 2021年2月23日周二 10:50写道: … <#m_-9103245680687119030_m_-6708683990822357179_> Thanks for your code ! When i use imitation loss in my dataset and work, i'm confused about how to determine the imitation loss weight, without imitation loss, my total loss is about 1e-2, with default imitation loss weight(0.01), my imitation loss is about 1.3, how can i balance other loss and imitation loss? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#25 <#25> <#25 <#25>>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM63H4VWPF72MVX2WDTTAMJXRANCNFSM4YBVAMVA . 就是说在训练初始阶段,我的loss(分类损失+定位损失)大约在1e-2,imitation loss大约为300左右,我是不是应该给imitation loss一个很小的权重(10-4),让imitation loss与loss差不多?或者说这两个损失保持怎样的比例比较好? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#25 (comment) <#25 (comment)>>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKM3RLZTSMWAXSBD6AHDTANY3HANCNFSM4YBVAMVA . Thanks! I tried and find it better that the imitation loss is about 6~10 times other loss. More, is the kernel size of the adaptation layer important ? I mean i find you use 3x3 kernel with padding=1, what if using 1x1 kernel ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#25 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AELTKMYBOTZGL357NRFGAZTTAW7HNANCNFSM4YBVAMVA .
Thanks ! 1x1 kernel size is better for my data.