gill
gill copied to clipboard
why don't you use universal representation in one task?
I am curious why don't you use universal representation in one task? like input: [image]+ caption output: caption +[IMG1]...[IMGn]