nntoan209 issues

Results 4 issues of


                                            nntoan209

Fix "idx" bug in split_data_by_length.py of BGE-M3

In the split_data_by_length.py code inside BGE-M3, after filtering the dataset by "max_length" field, the "idx" field is somehow changed , so the `split_dataset = dataset.select(idxs["idx"])` will result in the wrong...

BGE-M3 compute_score function is very inefficient

Is there any faster way to perform the function compute_score for BGE-M3 model? According to the code, it will have to encode the whole corpus num_queries times, and if num_queries...

BGE-M3 MCLS implementation

The BGE-M3 paper mentioned the MCLS (Multiple CLS) strategy to enhance the model’s long-text capabilities without the need for training. Does this repo contain the implementation for this strategy?

AgentEvol-7B has too short context length; Training scripts are not the same as in the paper.

Hi, can you please explain these problems: - The training scripts are not complete. In the paper, you stated that there are 2 training phases: Behavioral cloning (BC) and AgentEvol,...