recognize-anything
recognize-anything copied to clipboard
Why is the tag and Caption text predicted by Tag2Text different? Why didn't Tag2Text use specific tags given by user?
只输入一张图像,Tag2Text生成的caption并没有用上它生成全部的tags?此外,当Tag2Text的输入是一张图像和几个specific tags的时候,它生成的caption可能也并不包含specific tags?