Wang Siyuan comments

Results 7 comments of


                                            Wang Siyuan

关于Llama-3.1-8B-Instruct在Longbench v2 测试结果和排行榜有出入的问题

> 您好， > > 我测试的Llama-3.1-8B-Instruct 结果如下： > > Model Overall Easy Hard Short Medium Long Llama-3.1-8B-Instruct 29.0 30.7 28.0 33.9 25.6 27.8 > > 和排行榜中的Overall 有一个点的差距（29.0 vs 30.0），我的环境如下： > >...

Potential Error in Benchmark Data – Incorrect Answer for Question ID 6701cda0bb02136c067cb6eb

> Thanks for pointing it out! We will soon have our annotator check the data and update the dataset. Thank you for your prompt response and for looking into the...

Running the GRPO program on multiple nodes causes it to hang.

same problem here.

Error for AGIEval when using fewshot

> Hi! So the dataset we are using is missing the fewshot split. It uses the test split for the fewshot samples and looks like one of the rows in...

Reproducibility Issues

The randomness here is not only caused by the random seed. If not explicitly set, the default random seed is 0, so it is reasonable that the results may differ...

LlamaForSequenceClassification forward method show different results with input_ids/inputs_embeds

[input_embeds not checking pad token](https://github.com/huggingface/transformers/blob/3f06f95ebe617b192251ef756518690f5bc7ff76/src/transformers/models/llama/modeling_llama.py#L1316C17-L1318C70) ```python if self.config.pad_token_id is None: sequence_lengths = -1 else: if input_ids is not None: # if no pad token found, use modulo instead of reverse...

LlamaForSequenceClassification forward method show different results with input_ids/inputs_embeds

> [input_embeds not checking pad token](https://github.com/huggingface/transformers/blob/3f06f95ebe617b192251ef756518690f5bc7ff76/src/transformers/models/phi3/modeling_phi3.py#L1490C9-L1499C38) > > ```python > if self.config.pad_token_id is None: > sequence_lengths = -1 > else: > if input_ids is not None: > # if no...