ViTMatte use ViTMatte-B to inference Distinctions-646, A100(80G) out of memory

Jul 19 '23 08:07 tenzinOvO

We use the grid sample strategy to reduce the inference computation burden. Please take a look at the last section of our paper for detail. The code can be found here.

Jul 19 '23 12:07 JingfengYao

Thanks for your response. So if i want to reproduce the result on Distinctions-646 reported in paper, i need to replace the vit.py in ViTMatte with the vit.py in MatteAnything?

Jul 19 '23 15:07 tenzinOvO

Yes. Or you can replace only the forward function in the Block. BTW, when you reproduce the results on Distinctions-646, the results will be influenced by the different trimaps you use.

Jul 19 '23 16:07 JingfengYao

Thanks for your reminding. I was curious whether the pseudo trimap in MatteAnything(Table 1 2 3 4) was obtained through a real user study or a simulation of user interactions implemented by code based on ground truth.

Jul 20 '23 08:07 tenzinOvO

It's a real user study.

Jul 20 '23 09:07 JingfengYao

The matting outcome is markably better than matteformer. The big model is also visibly better than the small model. However, the memory requirement for inference is also markably bigger. For using the forward function in vit.py from this repo, it's 16x more for the big model and 8x more for the small model. After change the forward function to what is in MatteAnything, the big model still demands more memory than matteformer, which can run its inference with the same (hi-res) image in the same machine with about 80 GiB of GPU, and ViTMatte can't start the job (see below error message). Any other suggestion for further reducing the memory footprint?

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 53.78 GiB (GPU 0; 79.10 GiB total capacity; 68.61 GiB already allocated; 8.59 GiB free; 69.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Oct 25 '23 18:10 shiwanlin

ViTMatte's high memory requirement is mainly because of the attention mechanism in the ViT backbone. From my perspective, I may try memory efficient attention or flash attention to replace the original attention in ViT to further reduce the computation burden. (NOTE: Using different attention in inference may cause performance degradation since the inconsistency between training and inference.)

Oct 26 '23 04:10 JingfengYao

can you share the distinction-646 dataset? for some reasons, the author is not accessible now.

Nov 22 '23 08:11 felix-ky

@felix-ky https://github.com/yuhaoliu7456/CVPR2020-HAttMatting

Nov 25 '23 15:11 almorozovv