Shitao Xiao

Results 509 comments of Shitao Xiao

The usage of kaggle and colab is same as jupyter. You can pip install FlagEmbedding and run the command by adding `!` at the beginning, e.g., ``` ! pip install...

@RKoopal , thanks for your suggestion! MT5 uses the pad_token_id as the starting token for decoder_input_ids generation, but the preprocess function you used doesn't add a special token at the...

@zhaobinNF @adol001, we use the official negatives. You can download our dataset from huggingface: https://huggingface.co/datasets/Shitao/bge-reranker-data, and then use the t2ranking data in the compressed file.

@yanzhang404 , I can't be certain about the impact on performance using it as a negative sample; it depends on your downstream task. Fine-tuning data should ideally match the downstream...

bge-m3 and bge-1.5 share the same pretraining script: https://github.com/FlagOpen/FlagEmbedding/tree/master/examples/pretrain

Hi, @sevenandseven , It looks like you haven't downloaded the model files correctly(`OSError: You seem to have cloned a repository without having git-lfs installed. Please install git-lfs and run git...

Can you share the command you used?

@545999961, please take a look at this issue when you are convenient.

@karong398 , if you have many candidates, searching using the CPU takes a lot of time. You can use GPUs or reduce the size of the corpus to speed up...

@jhyeom1545 , It could be that there is noise in your data, i.e. wrong positive and negative samples. You can try to filter the training data.