Tong Zhu (朱桐) comments

Results 45 comments of


                                            Tong Zhu (朱桐)

parameter of weight_gates are not initialized from huggingface checkpoint

This is a problem from the version of `transformers`. I didn't reproduce this since I use an older version (`4.31.0`). With `4.47.1`, this problem finally occurs. After some research on...

Is current megablocks compatible with distributed optimizer in Megatron-LM?

I setup an experiment with 64 experts split across 2 devices with expert parallel. Both MegaBlocks and distributed optimizer are enabled. However, I found the saved experts across devices are...

英文数据集改为DuEEData

嗨，您好，感谢您对本项目的关注。 - 您指的是DuEE-1.0那个句级事件抽取，还是DuEE-Fin那个篇章事件抽取数据呀？我记得这两个数据都是中文数据集。 - 方便提供一个样例case，以及启动脚本吗，谢谢~

了解了。您需要根据Doc2EDAG的格式重新整理数据格式，对于英文数据来说，可以参考下面的例子（这里是一条数据）。注意里面的span都是按空格tokenize之后的indices。此外，如果是自有数据，则需要根据事件类型模板自行调整template，实现方案可参考这个文件夹：https://github.com/Spico197/DocEE/tree/main/dee/event_types ``` [ "scenario_en_kairos_14", { "sentences": [ "As of early Tuesday there was no claim of responsibility . Prayuth Chan - ocha , the head of Thailand \u2019 s military...

英文数据集改为DuEEData

嗨，抱歉回复晚了。我不记得具体数值了，但印象中模型在英文上的效果确实很差，可能的原因有： 1. 数据集数量：WikiEvents相较于ChFinAnn这种自动化构建的数据太少了 2. Encoder：Doc2EDAG、PTPCG默认都是不使用现成PLM的，只是用了它们的词表，在数据量比较少的情况下很不占优势 3. Tokenizer：这里对英文的处理是按空格切分，直接`convert_tokens_to_ids`，而没有切成subword，会导致输入序列中存在很多`[UNK]`，对模型理解影响很大

Any experiments about the load balancing loss?

Hi there, thanks for your attention to this project~ 1. Yes the best results are obtained by `Independent-Random` 2. Since MoE-fy from dense models is not a common setting for...

使用o2m格式的数据时，需要修改那些代码呢

嗨您好，抱歉回复晚了。适配多args的话，可以参考如下的code snippet，以及这个链接：https://github.com/Spico197/DocEE/issues/38#issuecomment-1176207177 ```python from collections import defaultdict from matplotlib import use import torch from dee.event_type import BaseEvent, event_type_fields_list from dee.utils import logger, regex_extractor from .ner import NERExample, NERFeatureConverter from .dee...

Tong Zhu (朱桐)

parameter of weight_gates are not initialized from huggingface checkpoint

Is current megablocks compatible with distributed optimizer in Megatron-LM?

英文数据集改为DuEEData

英文数据集改为DuEEData

英文数据集改为DuEEData

Any experiments about the load balancing loss?

使用o2m格式的数据时，需要修改那些代码呢

多事件

Read Me First! 遇到报错提issue之前先看这里!

运行DCFEE