SimCSE icon indicating copy to clipboard operation
SimCSE copied to clipboard

How do I use SimCSE on my own dataset?

Open skye95git opened this issue 3 years ago • 1 comments

I'm doing search task and the pre-training model I'm using is RoBerta Base. I would like to join SimCSE on this basis, how to use SimCSE on my own data set?

skye95git avatar Jul 20 '22 08:07 skye95git

Hi,

If you want to train SimCSE on your own dataset, you can simply replace our training data with your own in the same format. And we already provided an example script in readme.

gaotianyu1350 avatar Jul 24 '22 20:07 gaotianyu1350

Hi,

If you want to train SimCSE on your own dataset, you can simply replace our training data with your own in the same format. And we already provided an example script in readme.

Hi, I guess you mean we can prepare data and use the shell script to train own model. But I wonder how to use the installed module (pip install simcse) to train own model.

adhb22 avatar Aug 17 '22 03:08 adhb22

Hi,

The pip package cannot be used to train your own model. To do this you need to use this github repo and follow the readme.

gaotianyu1350 avatar Aug 29 '22 13:08 gaotianyu1350

Hi,

If you want to train SimCSE on your own dataset, you can simply replace our training data with your own in the same format. And we already provided an example script in readme.

Hi, can I train and evaluate SimCSE on my own datasets? Although I can train it on my dataset by setting "--train_file", I don't know how to evaluate SimSCE on my test set. It seems that SimCSE can only evaluate on some specific tasks according to your source code.

TomasAndersonFang avatar Sep 07 '22 05:09 TomasAndersonFang

Hi,

We use a modified version of SentEval for evaluation. For your own evaluation file you can modify the SentEval part of code. You will have to implement your own evaluation protocol if you want to do a HIT@N (retrieval style) type of evaluation. This repo might be helpful for retrieval-style evaluation: https://github.com/castorini/pyserini

gaotianyu1350 avatar Sep 12 '22 19:09 gaotianyu1350