ActionCLIP
ActionCLIP copied to clipboard
About KLLoss
Thanks for your amazing work!
The KLLoss in the implementation is divided by feature dims (times batch_size in code), instead of batch size.
The docs of PyTorch points that reduction = 'batchmean'
aligns with KL math definition. I'm writing to ask the reason for the implementation choice.
Thanks in advance.