namespace-Pt comments

Results 50 comments of


                                            namespace-Pt

activation beacon Needle In A Haystack test failed

`data/book/dinosaurs.txt`方便也分享一下不

activation beacon Needle In A Haystack test failed

Hi, 我按照脚本进行了测试： 1. Activation Beacon在8K上大部分时间能成功找回needle，但是32K基本全部失败，这是因为越长的context需要更大的压缩率（8K->2, 32K->8），而压缩会带来信息的损失，因此当context变长后复现needle任务上表现很差，这和我们在topic retrieval/passkey retrieval上的结论是一致的。但是，值得注意的是模型失败时竟然会输出完全一样的结果，这一点我们会继续深入研究，之前没有注意到这个问题。 2. Activation Beacon可以和检索配合从而增强其在这种高精度记忆类任务上的表现，目前其能够支持简单的bm25检索，即通过bm25确定3个interval，这些interval中的内容使用较低压缩率（2），其余内容使用较高压缩率（128），这种方法可以改善Activation Beacon在needle in a haystack上的表现，基本能够保证80%成功率。我用了PG19 test上第一本书，代码在[这里](https://gist.github.com/namespace-Pt/0a5a35ba9899304642aecc4e859949c2)，请你尝试。 3. Activation Beacon仅是我们对于长文本的一个初步尝试，验证了压缩的可行性，之后我们会将其与更加精巧的检索机制结合在一起，形成系统的长文本解决方案，请你继续关注并提出宝贵意见。

activation-beacon

Hi，谢谢你的关注 1. 我目前更新了代码在[这里](https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beacon/new)，现在支持deepspeed 3 2. 24g显存应该需要设置zero3中offload_param以及offload_optimizer，具体情况需要你自行调节，如果两个都开了还是放不下，那建议把序列调短，即训练时设置max_length为一个小于8192的数，如7168、6144。但效果大概率会受到影响。

关于llm embedder中msmarco teacher_scores计算脚本的疑问

hi, 谢谢你的关注。 msmarco和nq上的llm score和hard label高度一致，因为answer是从ground-truth document里抽取出来的，导致ground-truth document分数很高，别的分数很低。在这两个数据集上直接用llm score效果不佳，因此我们做了额外的优化，即先通过llm score训了一个reranker，然后再用reranker的score训retriever。文件里的是最终reranker的score，建议直接用。

关于llm embedder中msmarco teacher_scores计算脚本的疑问

hi, 1. 是的 2. reranker的分数更好是在训练retriever上更好，并不是说它更了解llm偏好。原因在于蒸馏retriever时理想的teacher的信号是有高分有低分，分数参差不齐（比如true negative是低分，false negative分数相应较高），而不是所有candidate得分均类似或者某一个candidate得分很高，别的都很低。 msmarco和nq上因为答案就摘抄自ground-truth passage，因此用答案生成概率作为teacher分数时，会导致gt passage的分数很高，别的都比较低，从而使蒸馏效果受限。reranker之所以会更好是因为其经过训练后能够输出有高有低的teacher分数（分辨true neg和false neg），能够起到更好的蒸馏作用。此外，我们论文中提出的stabilized distillation也是为了解决这个问题，两者协同工作，经验上能够训出更好的retriever。

关于llm embedder中msmarco teacher_scores计算脚本的疑问

Hi, a. 是从deberta-large模型启动，在msmarco/nq的数据上训练1个epoch得到的；用了llm打分最高的和random sample的31个作为candidate，使用kl-divergence优化模型 b. 只有msmarco和nq c. 这个我没注意，有可能是和deberta这个模型自身的属性相关

关于llm embedder中msmarco teacher_scores计算脚本的疑问

Hi, a. 没，每个数据集单独训一个 b. 是的，query和passage拼一起，头上加一个cls，然后取cls的last_hidden_state后经过一个pooling层得到分数 c. 还是Llama-2-7b-chat

关于llm embedder中msmarco teacher_scores计算脚本的疑问

hi，训reranker时不需要加teacher/student temperature，同样也不需要加对比损失的temperature。只有用cos similarity训embedding模型时才需要设置对比损失的temperature

AttributeError: type object 'Dataset' has no attribute 'from_list'

Hi, we use `datasets==2.14.5`. But it should work fine for `datasets>=2.14.5`

Support vLLM for beacon models

Hi, Activation Beacon introduces extra parameters to the Llama model, including the `beacon_embed_tokens.weight` in your issue. You can try to use our [`modeling_llama.py`](https://huggingface.co/namespace-Pt/activation-beacon-llama2-7b-chat/blob/main/modeling_llama.py) instead. Nevertheless, supporting vLLM is currently not...