sgpt icon indicating copy to clipboard operation
sgpt copied to clipboard

Training SGPT for Custom Dataset

Open rajarajanvakil opened this issue 2 years ago • 1 comments

Hi I read your paper that is cool, am trying to do this on my own dataset and my dataset is huge. Can you please tell me the exact ways to train from the scratch to achieve SGPT- both symmetric and asymmetric in both the encoder. But cross encoder would be our interest. I Have one doubt are you using bert to produce cross and BI encoder embedding. In my understanding you are using BERT as initial pipeline before fetching it to GPT to produce the cosine similarity and log probabilities please help

rajarajanvakil avatar Jun 29 '22 10:06 rajarajanvakil

Hey!

  1. No BERT model is used
  2. For the SGPT Cross-Encoder no training is necessary. Just use the script here. For symmetric search just change the prompt 😇

Muennighoff avatar Jun 29 '22 12:06 Muennighoff