Can segmentation and multimodal understanding (e.g., VQA) tasks be trained simultaneously?

Open binzhouu opened this issue 3 months ago • 0 comments

Oct 14 '25 05:10 binzhouu