Yibo Zhao
Yibo Zhao
also interested to know
I have a stupid question. When calculating clip score, is it right to calculate the clip scores of all coco2017val image text pairs and then average them,?and what are the...
> We set it to empty ‘’ Thank you. When calculating fid, do you resize the images in the two folders to a same size, or other operations?
I changed your code, but similar issues still occur. Whether I am retrieving video information, such as: yt-dlp --skip-unavailable-fragments -F https://www.bilibili.com/video/BV1Dh411r7et --cookies test_cookie.txt or downloading a video: yt-dlp --skip-unavailable-fragments --merge-output-format...
window._BiliGreyResult = { method: "direct", versionId: "64940", } 验证码_哔哩哔哩window._riskdata_ = { 'v_voucher': 'voucher_e0f54c76-74d9-4c84-ba5a-4cddff9afe54' } Thank you for advice, It looks like a verification code is required when making multiple requests.
你可以进行调试,假设tensor shape是(b,f,h,w)。之前的工作通常(b * f,h * w)进行spatial attention ,(b * h * w,f)进行temporal attentio。而在cogvideox是(b,f * h * w)的3d full attention。以10 * 480 * 720为例,他的attention map是(2,48,3* 30* 45+226,3* 30* 45+226),其中2是batch,48是head,3是进过3dvae缩减的frame个数,30和45是attention map长宽,226是text emb长度。我写了一份可视化cogvideox中3d...