MiniCPM-o
MiniCPM-o copied to clipboard
Can we use in-context multimodal data for finetuning?
Thanks for your great work! However, it seems that we can only use data that contains one image for SFT. Can we use in-context multimodal data (i.e., containing multiple images) for finetuning?