Shitao Xiao comments

Results 509 comments of


                                            Shitao Xiao

Finetuning

The usage of kaggle and colab is same as jupyter. You can pip install FlagEmbedding and run the command by adding `!` at the beginning, e.g., ``` ! pip install...

Add 'encoder-decoder' support

@RKoopal , thanks for your suggestion! MT5 uses the pad_token_id as the starting token for decoder_input_ids generation, but the preprocess function you used doesn't add a special token at the...

@zhaobinNF @adol001, we use the official negatives. You can download our dataset from huggingface: https://huggingface.co/datasets/Shitao/bge-reranker-data, and then use the t2ranking data in the compressed file.

一些微调数据集的问题

@yanzhang404 , I can't be certain about the impact on performance using it as a negative sample; it depends on your downstream task. Fine-tuning data should ideally match the downstream...

bge m3如何进行预训练

bge-m3 and bge-1.5 share the same pretraining script: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain

bge-small-zh加载错误

Hi, @sevenandseven , It looks like you haven't downloaded the model files correctly(`OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run git...

NameError: name 'index_first_axis' is not defined

Can you share the command you used?

NameError: name 'index_first_axis' is not defined

@545999961, please take a look at this issue when you are convenient.

Hard negatives运行时一直卡顿，没有正常输出

@karong398 , if you have many candidates, searching using the CPU takes a lot of time. You can use GPUs or reduce the size of the corpus to speed up...

m3 embedding model pos score(similarity) is getting lower

@jhyeom1545 , It could be that there is noise in your data, i.e. wrong positive and negative samples. You can try to filter the training data.