big_vision
big_vision copied to clipboard
Mixup Per Example?
Hi! I was wondering why the implementation of mixup uses a single sampled $a$ per batch as opposed to using a different sample $a$ per batch element. Intuitively, it seems that doing this should lead to higher variance in the optimization process.
https://github.com/google-research/big_vision/blob/474dd2ebde37268db4ea44decef14c7c1f6a0258/big_vision/utils.py#L1086