shuiyigt comments

Results 3 comments of


                                            shuiyigt

平均耗时疑惑

独占3090卡应该显存是够的，tokenizer我也单独测了，应该是小头，不至于使得整体耗时正比例增加。现在基本就是比如：bs20耗时100，bs40耗时200，bs60耗时300，基本正比，单位耗时一致，搞得像里面是串行一样。 | | 汤甘 | | ***@***.*** | ---- 回复的原邮件 ---- | 发件人 | ***@***.***> | | 发送日期 | 2024年03月15日 18:55 | | 收件人 | FlagOpen/FlagEmbedding ***@***.***> | | 抄送人...

loss 全是0

modeling_baichuan.py中BaichuanAttention层的forward中有一点用到xformers的代码： ` if xops is not None and self.training: attn_weights = None # query_states = query_states.transpose(1, 2) # key_states = key_states.transpose(1, 2) # value_states = value_states.transpose(1, 2) # attn_output =...

请问BGE-M3中的multi-Granularity中的最大文档长度8192tokens是怎么实现的

> @chengzi-big , the transformer architecture does not have a length limit. The limitation of length comes from the positional encoding. We use the absolute positional encoding with a length...