CLF

Results 2 issues of CLF

To enable a larger batch size, we crop them into 512×512 patches during training.训练时,随机crop,那这块的文本描述也是在线使用llava提取caption嘛?

We use the semantic segmentation subset of OpenImage v6[15] as the main dataset for multi-task prompt tuning. In addition, following Smartbrush [32], we use segmentation labels and BLIP captions[16] as...