Results 213 comments of Niklas

Sure you can but I doubt it would improve performance using e.g. the TSDAE setup. If you have a dataset for supervised fine-tuning that may be more helpful.

Hey sounds exciting! I have only tried `accelerate` not `accelerate + deepspeed`, but I think it should work with deepspeed, too. Let me know how it goes!

The chunk size does not affect empirical results. Use the highest one that works for you! The higher it is the faster the training. A few other factors affect the...

I didn't experiment extensively with the LRs - I think it's based on SentenceTransformer defaults. I found adjusting the LR alongside batch size works best. E.g. for bs=1024, I used...

> Hello Niklas, I have a question regarding reproducing SGPT's result. On the [mteb leaderboard](https://huggingface.co/spaces/mteb/leaderboard), the 125M-weightedmean-msmarco-specb-bitfit model achieve 12.21 NDCG@10 on SCIDOCS. However, I wasn't able reproduce the result...

For inference, you can use accelerate for that I think; Check https://github.com/huggingface/accelerate/issues/769

Hmm, you could generate a lot of sentences with a GPT model using e.g. the `huggingface/transformers` library & then use models from this library to score their similarity & keep...

Hey! 1. No BERT model is used 2. For the SGPT Cross-Encoder no training is necessary. Just use the script [here](https://github.com/Muennighoff/sgpt#asymmetric-semantic-search). For symmetric search just change the prompt 😇

Hey @gante, thanks for getting back! I'm not sure what you mean by `pulling the model weights all the way down to the compute cores`? In your example, all samples...

Sure if you want to finetune you can follow some of what is outlined in this issue: https://github.com/Muennighoff/sgpt/issues/2 For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has...