hmd78
Results
1
comments of
hmd78
How about getting multi-modal embedding? Something like output of QFormer in BLIP which i think is the output of Qllama in your proposed work.