shuiyigt

Results 3 comments of shuiyigt

独占3090卡应该显存是够的,tokenizer我也单独测了,应该是小头,不至于使得整体耗时正比例增加。现在基本就是比如:bs20耗时100,bs40耗时200,bs60耗时300,基本正比,单位耗时一致,搞得像里面是串行一样。 | | 汤甘 | | ***@***.*** | ---- 回复的原邮件 ---- | 发件人 | ***@***.***> | | 发送日期 | 2024年03月15日 18:55 | | 收件人 | FlagOpen/FlagEmbedding ***@***.***> | | 抄送人...

modeling_baichuan.py中BaichuanAttention层的forward中有一点用到xformers的代码: ` if xops is not None and self.training: attn_weights = None # query_states = query_states.transpose(1, 2) # key_states = key_states.transpose(1, 2) # value_states = value_states.transpose(1, 2) # attn_output =...

> @chengzi-big , the transformer architecture does not have a length limit. The limitation of length comes from the positional encoding. We use the absolute positional encoding with a length...