End-to-End Dense Video Captioning with Parallel Decoding (ICCV 2021)
ttengwang
[AAAI 2023 Oral] VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
UARK-AICV