unilm icon indicating copy to clipboard operation
unilm copied to clipboard

[beit mask generation] Why is the aspect ratio sampled from log uniform instead of uniform in the mask generation?

Open ShengYun-Peng opened this issue 1 year ago • 2 comments

https://github.com/microsoft/unilm/blob/78b3a48de27c388a0212cfee49fd6dc470c9ecb5/beit/masking_generator.py#L59

The aspect ratio $\in [0.3, 1/0.3]$. I'm curious what is the intuition behind sampling from log uniform instead of uniform? The range is not spanning multiple orders of magnitude.

ShengYun-Peng avatar Jan 04 '24 16:01 ShengYun-Peng

Thank you for your question, and it's indeed an excellent point you've brought up.

If we directly use a linear space (np.linspace) to sample from a 1:3 to 3:1 aspect ratio, it will result in an asymmetrical number of cases where the aspect ratio is greater than 1 and less than 1. (1/3 ~ 1: less than 1, 1 ~ 3: greater than 1) This is because the ratios are not linearly distributed in value, especially when the ratio flips (i.e., from a width-to-height ratio to a height-to-width ratio).

To achieve uniform sampling of aspect ratios, we can first perform uniform sampling in logarithmic space and then convert back to the original aspect ratios. This approach allows us to maintain a balance between cases where the aspect ratio is greater than 1 and those less than 1.

addf400 avatar Jan 05 '24 05:01 addf400

Thanks, @addf400! That clarifies my question.

ShengYun-Peng avatar Jan 05 '24 20:01 ShengYun-Peng