nntoan209

Results 4 issues of nntoan209

In the split_data_by_length.py code inside BGE-M3, after filtering the dataset by "max_length" field, the "idx" field is somehow changed , so the `split_dataset = dataset.select(idxs["idx"])` will result in the wrong...

Is there any faster way to perform the function compute_score for BGE-M3 model? According to the code, it will have to encode the whole corpus num_queries times, and if num_queries...

The BGE-M3 paper mentioned the MCLS (Multiple CLS) strategy to enhance the model’s long-text capabilities without the need for training. Does this repo contain the implementation for this strategy?

Hi, can you please explain these problems: - The training scripts are not complete. In the paper, you stated that there are 2 training phases: Behavioral cloning (BC) and AgentEvol,...