hmd78 comments

Results 1 comments of


                                            hmd78

How about getting multi-modal embedding? Something like output of QFormer in BLIP which i think is the output of Qllama in your proposed work.