Yi Yang
Yi Yang
Hi @vrk7 do you still have problem with the colab?
Please find our study of inference time at Table 9 in the paper. https://arxiv.org/pdf/2306.08637.pdf In the table, we limit our setup to maximum to 50 frames and 50 points. Inference...
Hi bhack, Thanks for the observation. Here are some more information, 1. The current model is trained with 256x256 resolution, the huber_loss and expected_distance are both set under the 256x256...
Running a huge chunk of frames requires a lot of memory, for example, the resnet computes dense feature maps for every frame and store them in the GPU memory. If...
You also need to precompute your query_feature beforehand, and run model prediction with your precomputed query_feature. Code for doing that is provided below (not throughly checked): ``` def build_model_init(frames, query_points,...
For each frame, TAPIR needs to see enough temporal context frames to make the best possible prediction. However due to memory limitation, the boundary frames (i.e. frame 300 in your...
Hi @YJ-142150 , TrecViT code is now released at https://github.com/google-deepmind/trecvit , please take a look. Thanks.
Hi @Kou-99 , thanks for your interest! At the moment, we don’t have plans to release the BootsTAP training code since it is tightly integrated with Google’s internal systems.
Hi @Kou-99 , as Carl mentioned, it is normal Kinetics videos are wipeout. Luckily there are ~1000 videos, so missing a few may not affect performance significantly. I will close...
Hi weirenorweiren, Can you try comment this line https://github.com/google-deepmind/tapnet/blob/a0cd34d4e25bbc8511041b8fcbf2e21445952bba/tapnet/__init__.py#L21 then see if live_demo.py can work without tensorflow dependency? Thanks.