maxvit
maxvit copied to clipboard
[ECCV 2022] Official repository for "MaxViT: Multi-Axis Vision Transformer". SOTA foundation models for classification, detection, segmentation, image quality, and generative modeling...
为什么计算局部注意力时,需要把特征图变换成 (H/P × W/P, P², C) 这个形状,即将P²放在倒数第二个维度? 而计算全局注意力时,则需要把特征图变换成 (G², H/G × W/G, C) 这个形状,然后再交换 【倒数第二个维度】 和 【倒数第三个维度】 的顺序,即变成 (H/G × W/G, G², C),既然这种形式和局部形式相同,为什么不直接进行相同的变换呢,而是再去额外的交换维度?
I have tried install MaxVit on Windows, but get: C:\Projects\****\venv\lib\site-packages\setuptools\command\install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools. warnings.warn( C:\Projects\***\venv\lib\site-packages\setuptools\command\easy_install.py:144: EasyInstallDeprecationWarning: easy_install command is deprecated....
I have tried with preprocessing method: `img_ = eval_driver.get_preprocess_fn()(tf.io.read_file(path))` But get constantly errors like: ``` {function_node __wrapped__ExtractJpegShape_device_/job:localhost/replica:0/task:0/device:CPU:0}} Invalid JPEG data, size 442707 [Op:ExtractJpegShape] Not a JPEG file: starts with 0x89...
ModuleNotFoundError: No module named 'maxvit.models'; 'maxvit' is not a package
Thanks for the wonderful paper - it was a pleasure to read! Could you kindly elaborate a bit more on the COCO training details? In particular, I was wondering about...
Hi good job, could you tell me how to fit your code for input size, such as 128*128, 192*192, 256*256 ? thanks.