Kexin Wang comments

Results 32 comments of


                                            Kexin Wang

How to create dataset to train in GPL from normal set of domain specific word docs or pdfs.

Hi @kingafy, GPL needs only a corpus.jsonl file (data sample is [here](https://github.com/UKPLab/gpl/blob/main/sample-data/generated/fiqa/corpus.jsonl)) for minimal running. Specifically, you need three steps: 1. Prepare your corpus in the same format as this...

How to create dataset to train in GPL from normal set of domain specific word docs or pdfs.

Hi @christopherfeld, could you please try this toy example: https://github.com/UKPLab/gpl/issues/5#issuecomment-1144256494 and let me know whether it is runnable in your case?

How to create dataset to train in GPL from normal set of domain specific word docs or pdfs.

Hi @christopherfeld, I have created a google colab notebook showing the running results of the toy example: https://colab.research.google.com/drive/1Wis4WugIvpnSAc7F7HGBkB38lGvNHTtX?usp=sharing Please have a look:). BTW, I think the error might be due...

pytrec_eval dependency issue

Hi @christopherfeld, sorry for the delay. I will look into this in the near future. Your idea sounds great and I think we can have this option during pip install

Error when trying to train on msmarco-passage.

Hi, It seems that the latest version of HF's trainer will only create a scaler when enabling `sharded_ddp` https://github.com/huggingface/transformers/blob/fa6107c97edf7cf725305a34735a57875b67d85e/src/transformers/trainer.py#L637 Does this influence the `tevatron` code? Thanks

Kexin Wang

How to create dataset to train in GPL from normal set of domain specific word docs or pdfs.

How to create dataset to train in GPL from normal set of domain specific word docs or pdfs.

How to create dataset to train in GPL from normal set of domain specific word docs or pdfs.

pytrec_eval dependency issue

Error when trying to train on msmarco-passage.

Error when trying to train on msmarco-passage.

Make the local-minimal version runnable again

Make the local-minimal version runnable again

do you have pytorch implement?

SimCSE supervised