OctoNinja9
OctoNinja9
In this line (below), it seems that the code uses `masks` to predict boxes, https://github.com/microsoft/X-Decoder/blob/165f8a6314ac84f5c36aaab7216f90dd97e38a43/modeling/architectures/xdecoder_model.py#L922 but in [line 913](https://github.com/microsoft/X-Decoder/blob/165f8a6314ac84f5c36aaab7216f90dd97e38a43/modeling/architectures/xdecoder_model.py#L913), the predicted boxes are already obtained. https://github.com/microsoft/X-Decoder/blob/165f8a6314ac84f5c36aaab7216f90dd97e38a43/modeling/architectures/xdecoder_model.py#L913 Why do not use...
Hi, thanks for releasing the code. Why do you choose `MPI adaptor` instead of using default `torch.distributed` (just like in [Mask2Former](https://github.com/facebookresearch/Mask2Former/blob/9b0651c6c1d5b3af2e6da0589b719c514ec0d69a/train_net.py#L321C8-L321C8)) ? ```[tasklist] ### Tasks ```
Dear author, I do not understand the difference between `refs` and `ref_values`. Can you tell me their difference? https://github.com/FoundationVision/UniRef/blob/e339305039ffaa5500a25e34cf6a0677c109a2c9/projects/UniRef/uniref/models/fuse_helper/unifusion.py#L79
Hi, can you tell me how to evaluate the performance of caption generation according to your code?
Hi, @Haiyang-W! You have done a very interesting work. However, I encounter a problem when calculating the FLOPs of the GiT model. When I run `python tools/analysis_tools/get_flops.py`, it output 0...