gpl icon indicating copy to clipboard operation
gpl copied to clipboard

Can gpl be used on Chinese models?

Open wduo opened this issue 2 years ago • 6 comments

Great job! Can I use gpl on Chinese models please? Which query generator model should I use? Which base models should be used? Which retrieval model should be used? Looking forward to your reply. thanks. @jcklie @reckart @dpetrak @nreimers @mbugert

wduo avatar Apr 03 '22 08:04 wduo

At the moment we have the doc2query model only for English. Also the Cross-Encoder is only available for English.

But they could be trained on this new dataset: https://arxiv.org/abs/2203.10232

@kwang2049 What do you think, should we train doc2query & cross-encoder for Chinese?

nreimers avatar Apr 04 '22 22:04 nreimers

I am very appreciate that if you could train doc2query & cross-encoder for Chinese. Thx alot!

liushenglei avatar Apr 11 '22 02:04 liushenglei

Hi @liushenglei, thanks for your attention! Sorry for the late reply. I have just come back from my holiday:).

@nreimers yes! I am also very interested in that and would be very happy if there would be some models for my mother tongue:). I think the big question is about the training data. Do you have any suggestions? Personally, I only know Baidu's DuReader_retrieval. It has >80K query-passage pairs obtained from the Baidu search engine.

kwang2049 avatar Apr 11 '22 14:04 kwang2049

@kwang2049 @liushenglei @wduo Can you please keep me in the loop? I am also interested in the CN model. My email is [email protected]. Can we connect? Thank you.

maxdata avatar Apr 11 '22 15:04 maxdata

@kwang2049 I think the DuReader dataset is good. But as I don't know Chinese, I'm not able to use that dataset as explanation etc. are mostly in Chinese.

nreimers avatar Apr 11 '22 15:04 nreimers

@kwang2049 @liushenglei @wduo Can you please keep me in the loop? I am also interested in the CN model. My email is [email protected]. Can we connect? Thank you.

Yeah, sure. I think we can just write and post things here for now. I will also update it here if I got some new findings about this topic:)

kwang2049 avatar Apr 11 '22 16:04 kwang2049