Niklas comments

Results 213 comments of


                                            Niklas

Fine-tune Muennighoff/SGPT-2.7B-weightedmean-msmarco-specb-bitfit using TSDAE approach

Sure you can but I doubt it would improve performance using e.g. the TSDAE setup. If you have a dataset for supervised fine-tuning that may be more helpful.

accelerate + deepspeed?

Hey sounds exciting! I have only tried `accelerate` not `accelerate + deepspeed`, but I think it should work with deepspeed, too. Let me know how it goes!

Why use low chunksizes?

The chunk size does not affect empirical results. Use the highest one that works for you! The higher it is the faster the training. A few other factors affect the...

Why use low chunksizes?

I didn't experiment extensively with the LRs - I think it's based on SentenceTransformer defaults. I found adjusting the LR alongside batch size works best. E.g. for bs=1024, I used...

cannot reproduce leaderboard result

> Hello Niklas, I have a question regarding reproducing SGPT's result. On the [mteb leaderboard](https://huggingface.co/spaces/mteb/leaderboard), the 125M-weightedmean-msmarco-specb-bitfit model achieve 12.21 NDCG@10 on SCIDOCS. However, I wasn't able reproduce the result...

Can I use multi GPUS

For inference, you can use accelerate for that I think; Check https://github.com/huggingface/accelerate/issues/769

generating similar sentences

Hmm, you could generate a lot of sentences with a GPT model using e.g. the `huggingface/transformers` library & then use models from this library to score their similarity & keep...

Training SGPT for Custom Dataset

Hey! 1. No BERT model is used 2. For the SGPT Cross-Encoder no training is necessary. Just use the script [here](https://github.com/Muennighoff/sgpt#asymmetric-semantic-search). For symmetric search just change the prompt 😇

StoppingCritera for individual samples in batched input

Hey @gante, thanks for getting back! I'm not sure what you mean by `pulling the model weights all the way down to the compute cores`? In your example, all samples...

Could I fine tune this model for Chinese datasets?

Sure if you want to finetune you can follow some of what is outlined in this issue: https://github.com/Muennighoff/sgpt/issues/2 For asymmetric search (e.g. retrieval), you can also try https://huggingface.co/bigscience/sgpt-bloom-7b1-msmarco which has...