Results 2 comments of Kang Liu

```python logits = coca( text = text, images = images ) # (4, 512, 20000) ``` I also have the same question. Although the caption logits can be obtained using...

I also have the same question, hoping to clarify it. Thank you!